Answer Set Programming for E-Graph DAG Extraction

Related Articles

electronic graphs (equivalence graphs) are data structures for efficient and compact storage of many equivalent versions of expressions. They are useful for performing compiler optimizations and proving theorems.

Ultimately, one wants to extract a version of an expression out of the box with good properties so that it can be turned into efficient code or hardware material. This problem can be difficult depending on how you want to design it.

Answer set programming is a paradigm for solving satisfiability problems or combinatorial optimization. It offers some unusual modeling capabilities / justification / supported set that are not offered by sat down, CSP, ILPor SMT solving It is precisely this aspect that makes it very useful for a particular model of the extraction problem. Klingo Solver ASP is about a Data log-like deficit gringo that feeds into an SAT-like solver buckle with Intrinsic loop breaking constraints support. It rummy Independently if this idea encourages that it is good, because it is often right.

A method for some cost models is to apply dynamic programming. In this model, the cost of the anode in fists is a function of the minimum cost of its children. Very natural right? You can calculate the best cost as a fixed point calculation, and determine the best anode for each eclass to be the one that currently has the smallest cost. It will come together pretty quickly.

It happens quite often that this framework does not model the actual costs very well, especially when it comes to sharing. Shared subexpressions can be reused cheaply. In this case, what you want to extract is not an optimal AST, but an optimal DAG. The cost function of the DAG is intrinsically more non-local or non-compositional than the cost function of the AST. It may be desirable to choose a locally less optimal subexpression if that expression has more parts to reuse elsewhere.

A simple additive tree cost model is $sum_n c_n m_n$ where $c_n$ is the cost of anode $n$ and $m_n$ is the multiple you chose.

The cost model of a DAG would instead be $sum_n c_n b_n$ where instead $b_n$ is not a 0-1 boolean variable saying whether you have selected the node or not.

In any case, the optimization problem is limited to selecting subgraphs of the graph that form valid trees or valid DAGs. Tree-ness or DAG-ness is a subtle property that is surprisingly difficult/impossible to describe in first-order logic. This is an example where you need a concept of at least a fixed point in your logic.

See for example:

This axiom states that if all your children are valid trees, then you are a valid tree.
Tree(node) = (forall c, child(c,node) => Tree(c)).

But this axiom is loose for the idea of ​​a tree. Tree(C) := true is a valid model of this axiom, which just says that every node is a tree in every graph.

As a second counterexample, consider a graph with a single vertex a and a single self edge a -> a. Clearly this node is not a tree. But this self consistent to tell a It is a tree node according to our bad definition, since indeed all its children are also tree nodes.

Okay, you might try to fix this by making trees have no self-loops, but an extension of this counterexample is to consider a huge loop of arbitrarily large size N. How can you combat this?

What we want to express is that we can find the trees iteratively by first assuming that a leaf (with no children) is a tree, then continuing to grow our class of tree things to include anything with all the children we’ve already determined are trees. This is a very intuitive interpretation of the concept of projection =>But this No The idea of ​​classical first-order logic.

This is exactly what data log or answer set programming does. It :-. Note that until you start using non-stratified negation, programming the answer sets and Datalog are essentially the same (ASP refers to negation “in advance” / consistently. You test an ASP solution by running Datalog on using current derived atoms for positive atoms but guessing a final solution for the atoms the exclusions). Both allow you to derive a minimal model or at least a fixed point of a stack of beam sections.


tree(N) :- node(N), tree(C) : edge((C,N)).

% Answer: 1
% node(1) node(2) node(3) node(4) edge((1,2)) edge((2,3)) tree(4) tree(1) tree(2) tree(3)

It is precisely this feature that makes it possible to demand that only the DAGs be extracted to the ASP solver in a very natural way.

The ASP model begins with some description of a fixed fist described as a stack of relations.

For example, coding of the resulting punch x + y = y + x may look like

% enode(eid, symbol, child_eid...)
enode(2, "x").
enode(3, "y").

However, it’s convenient to not have to deal with different numbers of arguments separately, so instead I encode it using an encoded identifier.

% enode(eid, enode_id, sym)
% child(eid, enode_id, child_eid)



In answer set programming, the marking { a } This is how you represent the option to choose a be true or false. We may select an anode only if its children are selected. This is a DAG property.

% We may choose to select this enode if we have selected that class of all its children.
{ sel(E,I) } :- enode(E,I,_,_), selclass(Ec) : child(E,I,Ec).

It is convenient for modeling to have an auxiliary predicate for eclass selection. We choose eclass if we choose a node in eclass.

% if we select an enode in an eclass, we select that eclass
selclass(E) :- sel(E,_).

We want to disallow any solution where we don’t choose the root.

% It is inconsistent for a eclass to be a root and not selected.
:- root(E), not selclass(E).

This is quite different from saying that we choose the root, which I would write that way

selclass(E) :- root(E).

These statements are classically equivalent, but they are No Equivalent to ASP. I can show the classical equivalence using the invalid ASP identity A => B == not A / B.
forall E, root(E) / not selclass(E) => false == not root(E) / not not selclass(E) / false == not root(E) / selclass(E) and forall E, root(E) => selclass(E) == not root(E) / selclass(E).

If we add this axiom, we would allow choosing a solution where we pick the root but do not require the children of the root to be picked first. This is again a subtle use of ASP modeling, but it is exactly the subtlety that we want to express the required DAGness of the desired solution.

Finally, we want to minimize the cost of the chosen anodes.

#minimize { C,E,I : sel(E,I), enode(E,I,_,C) }.

It seems to help the time solver to add that we don’t expect to select more than one anode per eclass in the minimization model.

% optional constraints to help?
:- enode(E,_,_,_), #count { E,I : sel(E,I)} > 1.

The short nature of the coding reflects that ASP is a very natural framework for this problem. ASP has a number of different solver architectures, but the basic principle of some of them is that they are roughly SAT solvers + learned cycle breaking / support constraints. In most other solvers, these circuit-breaking constraints must be eagerly generated (this is also done in the types of ASP solvers that eagerly fold into SAT).

Max started cooking Rescue gym. I don’t know that it’s fully baked and ready for contributors yet (although I think more benchmark punches they Welcome).

I had to poke around the rust covers a bit to figure out how to run Klingo programmatically, but it’s not that bad. You build a solver, add facts and rules, call a grounder (roughly an over-approximated data log run), then a solver that returns a sequence of better and better solutions.

He seems to be slow on the big issues. There are probably still refinements / unnecessary constraints / symmetry breaking / hints that can speed it up. In addition, providing an upper bound cost from the bottom up may speed it up.

I bet heuristic methods might be better in terms of solution time / solution quality, but still it’s pretty nice.

use super::*;
use clingo::control;

use clingo::ClingoError;
use clingo::FactBase;
use clingo::ShowType;
use clingo::Symbol;
use clingo::ToSymbol;

// An enode is identified with the eclass id and then index in that eclasses enode list.
struct Enode {
    eid: u32,
    node_i: u32,
    op: String,
    cost: i32,

struct Root {
    eid: u32,

struct Child {
    eid: u32,
    node_i: u32,
    child_eid: u32,

const ASP_PROGRAM: &str = "
% we may choose to select this enode if we have selected that class of all it's children.
{ sel(E,I) } :- enode(E,I,_,_), selclass(Ec) : child(E,I,Ec).

% if we select an enode in an eclass, we select that eclass
selclass(E) :- sel(E,_).

% It is inconsistent for a eclass to be a root and not selected.
% This is *not* the same as saying  selclass(E) :- root(E). 
:- root(E), not selclass(E).

:- enode(E,_,_,_), #count { E,I : sel(E,I)} > 1.

#minimize { C,E,I : sel(E,I), enode(E,I,_,C) }.

#show sel/2.

pub struct AspExtractor;
impl Extractor for AspExtractor {
    fn extract(&self, egraph: &SimpleEGraph, _roots: &[Id]) -> ExtractionResult {
        let mut ctl = control(vec![]).expect("REASON");
        // add a logic program to the base part
        ctl.add("base", &[], ASP_PROGRAM)
            .expect("Failed to add a logic program.");

        let mut fb = FactBase::new();
        for eid in egraph.roots.iter() {
            let root = Root {
                eid: (*eid).try_into().unwrap(),
        for (_i, class) in egraph.classes.values().enumerate() {
            for (node_i, node) in class.nodes.iter().enumerate() {
                let enode = Enode {
                    node_i: node_i.try_into().unwrap(),
                    op: node.op.clone(),
                    cost: node.cost.round() as i32,
                for child_eid in node.children.iter() {
                    let child = Child {
                        node_i: node_i.try_into().unwrap(),
                        child_eid: (*child_eid).try_into().unwrap(),

        ctl.add_facts(&fb).expect("Failed to add factbase");
        let part = clingo::Part::new("base", vec![]).unwrap();
        let parts = vec![part];
        ctl.ground(&parts).expect("Failed to ground");
        let mut handle = ctl
            .solve(clingo::SolveMode::YIELD, &[]) // maybe use ctl.optimal_models()
            .expect("Failed to solve");
        let mut result = ExtractionResult::new(egraph.classes.len());
        while let Some(model) = handle.model().expect("model failed") {
            let atoms = model
                .expect("Failed to retrieve symbols in the model.");
            for symbol in atoms {
                assert!( == "sel");
                let args = symbol.arguments().unwrap();
                result.choices[args[0].number().unwrap() as usize] =
                    args[1].number().unwrap() as usize;

            handle.resume().expect("Failed resume on solve handle.");
        handle.close().expect("Failed to close solve handle.");

A really cool and useful tool for working with Klingo is Clinograph. It allows you to visualize solutions directly from within Klingo. This is actually a separate command that you can stream other things to as well.

Below is a visual solution from egg::test_fn! {math_powers, rules(), "(* (pow 2 x) (pow 2 y))" => "(pow 2 (+ x y))"} example.

It’s a bit easier for me to draw the graph as a two-sided graph between eclasses and anodes than the typical “eclass as a dotted box surrounding anodes” visualization. Selected electronic classes are colored red and selected anodes are colored cyan.

The example below is a bash script that displays the callingraph reading.

echo " 

% we may choose to select this enode if we have selected that class of all it's children.
{ sel(E,I) } :- enode(E,I,_,_), selclass(Ec) : child(E,I,Ec).

% if we select an enode in an eclass, we select that eclass
selclass(E) :- sel(E,_).

% It is inconsistent for a eclass to be a root and not selected.
:- root(E), not selclass(E).

% This is *not* the same as saying  selclass(E) :- root(E). 

%#minimize { 1,E : selclass(E) }.
#minimize { C,E,I : sel(E,I), enode(E,I,_,C) }.

% optional constrains to help?
:- enode(E,_,_,_), #count { E,I : sel(E,I)} > 1.


node((E,I)) :- enode(E,I,S,_).
%node((sym,S)) :- enode(E,I,S,_).
node(e(E)) :- enode(E,_,_,_).

edge(((E,I),e(Ec))) :- child(E,I,Ec).
edge((e(E),(E,I))) :- enode(E,I,_,_).
%edge(((E,I),(sym,S))) :- enode(E,I,S,_).

attr(node, (E,I), label, S) :- enode(E,I,S,_).

% visualize solution
attr(node, (E,I), color, cyan) :- sel(E,I).
attr(node, e(E), color, red) :- selclass(E).

%#show sel/2.
%#show selclass/1.

% analysis

%treecost(E,C1) :- enode(E,_,_,_), C1 = #min { C,E,I : treecost(E,I,C) }.
%treecost(E,I,C1 + Cs) :- enode(E,I,_,C1), Cs = #sum { C, Ec : child(E,I,Ec)}.

% :- #sum { C,E,I : sel(E,I), enode(E,I,_,C) } <= #sum { C,E : root(E), treecost(E,C)}.

% is this redundant or not?
% :- sel(E,I), child(E,I,Ec), { sel(Ec,I) : enode(Ec,I,_,_) } = 0. 

" | clingo   
--quiet=1 --outf=2 |  clingraph --view --out=render --format=png --dir=/tmp/clingraph --type=digraph

I could, and maybe should, label Anod’s children in their order, but it doesn’t matter

OK, we can add another axiom to tighten it up
Tree(node) = (forall c, child(c,node) => Tree(c)).

Leaf nodes without children are a tree.

Transitive closure and first-order logic

Why can’t an approach be expressed in first-order logic?

If you try to express accessibility, the system is free to overestimate the concept of accessibility.

echo "

dag(X) :- node(X), sel(Y) : edge((X,Y)).

% visualize solution
 attr(node, N, color, cyan) :- dag(N).

" | clingo --outf=2 |
    clingraph --view --out=render --format=png --dir=/tmp/clingraph --type=digraph

Weighing the chosen one edges is the analog of tree extraction

blocking cycles

f(f(f(a))) = f(f(a))


aegraph may not have these problems. Another point in her favor.


We want to pay attention to sharing because sometimes it might be better

foo(large, large) = foo(large-1, large1-1) Fracture sharing. strange.

foo(bar(bar(x)), bar(bar(x))))) = foo(biz(x), baz(x))

Stupid example:
x+y+z+x+y+w = 2*(x+y) + z + w Well, if we had that explicit rewrite that would be cool. This makes the sharing an explicit sharing.

So we need something that has no syntax? Or we want to avoid expensive “give” rules.

The canonical data plan is the path plan. This shows that Datalog is able to express transitivity/transitive closure

path(X,Y) :- edge(X,Y), path(Y,Z).
istree(X) :- node(X), not path(X,X).

This doesn’t make sense unless we have a choice over children (as in egraph). So why even bother abstracting the problem?

Social network, choose a pyramid scheme. No loops, choose from your friends to max

echo " 
" | clingo   
--quiet=1 --outf=2 |  clingraph --view --out=render --format=png --dir=/tmp/clingraph --type=digraph



Popular Articles