E-graphs are Ground Rewrite Systems 2: E-matching

Another barrier to ideas here. It’s good to just throw things out. This is not my most content post.

Buyer beware. I think the basic ideas are correct, the details of the code as it is are probably wrong and definitely slow.

I also write way more checks than I can currently cash on lambdas and theories. But I still see promise.

Git commit at time of publication

The ground rewriting system implicitly defines a possibly infinite set of terms. This is the set of terms that is reduced to a sub-term on the left or right side of the system. Taking a term and reducing it is an effective method to check if a term is in the set of terms of the affer val mem : term -> egraph -> bool .

To generate/enumerate this set, one starts with all the terms and sub-terms in the GRS, and runs the rules backwards (from right to left) to produce the set of terms. This will generate larger and larger terms relative to the order term being used, which can be useful (you’ll know when to stop). val iter : egraph -> term Seq.t

The matching problem is, given a syntactic pattern, to find a term in this implicit term set and return bindings to the pattern’s variables. val ematch : pat -> egraph -> subst list. In principle this can be implemented using iter and filtering using standard pattern matching. It won’t be as efficient, but it’s conceptually simple.

I use the term eclass somewhat interchangeably with the canonical term for that eclass. The set of terms resulting from applying rules backwards is the eclass. Each term in this group is reduced to the smallest representative term in the reduction by the GRS (since we have total term order and are building a confluent system), so they are in 1-1 correspondence.

Ematching from top to bottom

The typical way I’ve seen ematching done is top-down ematching, which loops back down the pattern while searching the egraph.

A simple pattern matching ignoring the punch would look like this. We represent a match failure as returning the empty list and a pattern success as returning a shared list of the swap.

let match_ pat t : subst list =
  let rec worker pat t subst =
    match (pat, t) with
    | Const x, Const y ->
        if String.equal x y then [ subst ] else []
    | App (f, x), App (f2, x2) ->
        List.concat_map (fun subst -> worker x x2 subst) (worker f f2 subst)
    | _, Var _ -> failwith "ematch does not support Var in term"
    | Var x, _ -> (
        match List.assoc_opt x subst with
        | None -> [ (x, t) :: subst ]
        | Some t' -> if equal_term t t' then [ subst ] else [])
    | App (_, _), Const _ | Const _, App (_, _) -> []
  in
  worker pat t []

Ematching requires only a slight adjustment to this. Now in addition to searching for exact syntactic matches, we can also try to implement a rewritten step that will make ematch work.

Applying the rewrite rules once in reverse is very similar to expanding eclass into its anodes. God invert The function takes the map that defines the GRS and converts the keys and values. The values are not necessarily unique on the right side and therefore the inverse is term list TermMap.t.

let invert (e : egraph) : term list TermMap.t =
  TermMap.fold
    (fun k v acc ->
      TermMap.update v
        (fun ot -> match ot with Some x -> Some (k :: x) | None -> Some [ k ])
        acc)
    e TermMap.empty

let ematch e pat t : subst list =
  let eclass = invert e in
  let rec worker pat t subst =
    match (pat, t) with
    | Const x, Const y ->
        if String.equal x y || equal_term (find e pat) (find e t) then [ subst ]
        else []
    | App (f, x), App (f2, x2) -> (
        (* Boring case that occurs in ordinary matching *)
        List.concat_map (fun subst -> worker x x2 subst) (worker f f2 subst)
        @
        (* Interesting case where we match via a simplification rule *)
        match TermMap.find_opt t eclass with
        | None -> []
        | Some enodes -> List.concat_map (fun t -> worker pat t subst) enodes
        (* The kbo order might protect us here. Naively this looks like an infinite loop *)
        )
    | _, Var _ -> failwith "ematch does not support Var in term"
    | Var x, _ -> (
        match List.assoc_opt x subst with
        | None -> [ (x, t) :: subst ]
        | Some t' -> if equal_term (find e t) (find e t') then [ subst ] else [])
    | App (_, _), Const _ | Const _, App (_, _) -> []
  in
  worker pat t []

In the canonical GRS, there are no terms that are on both the left and right sides. This is because such a term will be reduced during the canonicalization using the R-simplify rule. This, combined with the fact that we use the term “good”, is why | Some enodes -> List.concat_map (fun t -> worker pat t subst) enodes Doesn’t cause an infinite loop (I think).

This assumes that we have a given term that we are trying to fit into the modulo pattern the equality in the fist.

To actually search the egraph, we need to enumerate the possible start terms to match.

let ematch_all e pat =
  let terms =
    TermMap.fold
      (fun t1 t2 s -> TermSet.add t1 (TermSet.add t2 s)) (* wrong i need to add subterms too *)
      e TermSet.empty
  in
  let terms = TermSet.to_seq terms in
  Seq.map (ematch e pat) terms

Adjustment from bottom to top

In top-down matches there is at least one overall guess: what is the root eclass where we are looking for the template.

In our implementation of equality saturation, we’re not really interested in searching within a particular eclass, we’re interested in matching within the entire class. It is not obvious that we should break down the problem in this way. Maybe it’s better to bulk search the whole bunch without listing roots first.

For example, let’s say we had a template that only had one variable. We might as well use the non-deterministic guess we already invested in just this variable guess. The variable can only be bound to the eclass within the egraph. So we just need to build the pattern with that guess replaced by , reduce the term, and then see if that reduction is in the egraph.

All in all, these are abilities we already had and are easy to express.

let ( let* ) x f = List.concat_map f x

let bottom_up (e : egraph) pat =
  let eclasses = eclasses e in
  let elist = TermSet.to_seq eclasses |> List.of_seq in
  let rec worker subst pat =
    match pat with
    | Const _s ->
        let t = reduce e pat in
        [ (t, subst) ]
    | Var v -> (
        match List.assoc_opt v subst with
        | None -> List.map (fun eid -> (eid, (v, eid) :: subst)) elist
        | Some eid -> [ (eid, subst) ])
    | App (f, x) ->
        let* f, subst = worker subst f in
        let* x, subst = worker subst x in
        let eid = reduce e (App (f, x)) in
        if TermSet.mem eid eclasses then [ (eid, subst) ] else []
  in
  worker [] pat

Now, for each variable in the pattern, we add another lookup loop and another factor of #eclasses to our complexity. Smarter joins and filtering can reduce this (joins on matching substitutes). It goes without saying that top-down matching is better and bottom-up fits simpler access patterns and easier indexing.

I often think back to this discussion Improved electronic matching to ground terms

Direct bottom-up matching can be seen as more closely related to the reduction technique for electronic matching. If we reduce a pattern variable in relation to a land rewriting system, it will reduce this variable completely. Perhaps the reduced perspective lends itself to more optimizations than the fully tested and guesswork version I described above.

I’m more confident at this point that it’s reasonable to think of egglog as a bottom-up functional logic programming language in the same way that datalog is a bottom-up prolog, since the reduction is part of the explanation of what FLP is.

One insight I’m excited about is that I believe “bottom-up” matching plays nicer with theories than top-down matching, but I haven’t reached a point of clarity in application where I’m sure this is true. The point is that for bottom-up matching, you really only need a notion of reduction, which is easier to apply in a theory-specific way. You’re also only dealing with very down-to-earth entities, whereas top-down matching is in a sense dealing with “less down-to-earth” residual patterns.

For example, in AC matching, it is difficult to guess the correct matching of sub-terms to top-down pattern parts. While bottom-up, we just need to normalize (sort and flatten) the current guess term.

Zooming in feels easier to me from the bottom up.

These different approaches are all contained in a relative fit that can go bottom up, top down, side to side. But there is simplicity here.

Range orders, analyzes and extraction

Also, I said at one point in the last post that I don’t know if GRS has a concept of extraction surgeries, but I’m not so sure that’s true anymore. Using analytics to build a custom ground term order and then just extracting a term reduction relative to the GRS seems to work pretty well

It is interesting to use the term order as the tie breaker in the finding of the intentional union. This is at least a significant tie breaker, while using max/min eid can’t be that meaningful semantically because the eid number itself is not significant (this is not quite true because eid tends to be related to age/distance from original terms as in aegraph)

The design space of term orders is very interesting. Especially when I only care about land terms.

Interpretive orders for lattices. Section 5.3 of the TRAAT.

scope of escape

I go back and forth if x-x = 0 OK or not. If x Supposed to be a bound variable, x can slip out of its scope if you consider using it as a right-to-left replacement. Although it is possible to interpret the terms with free variables, maybe the scope escape is not so bad. I suspect that a variation of the targeted association finding and “scope tagging” could be used on eids sites to prevent this if it’s worth it (scoped association findings).

accidental capture

It is more clear that the problem is accidental capture. foo ?a => x foo ?a . If you use de Bruyne numbers, you need to upload free metrics in ?a. If you go by name, I don’t know how to do alpha equiv, and you should know that there is none x B ?a (the free var set, made in egglog). Do not know.

Are lambda terms really ground?
Do I want beta or will only alpha be good enough?

After messing around with log-tk, I decided to experiment more bare bones, using my own naive term representation. git commit
I constantly have to fight the urge to worry about performance at all. That’s not my concern yet. I don’t even know what exactly I’m trying to build.

Going for a full form of ground terms. It’s interesting from the point of view of trying HOL without a lambda and eventually adding a lambda.

type term = Const of string | App of term * term

It may be desirable to compress partially applied terms into a App of term array constructor as optimization. More verbal types Lit of lit could be interesting

I specifically fight the temptation to hash cons everything and keep all term stats as depth even though it’s so easy.

A feature of ocaml that I like is s-expression printers and parsers. Creating real parsers isn’t that hard using parser generators, but I find that they drain my energy from a project.

This is the fastest way to run DSL after embedded DSL.

There is something very satisfying about having a separate file and watching your language grow.

Lispers must be rolling their eyes at me.

It was inspired and modeled after the workflow I saw in the UW egglog project.

type st = { egraph : egraph; rules : (term * term) list }

let empty_state = { egraph = TermMap.empty; rules = [] }

type action =
  | Rewrite of term * term
  | Assert of term
  | Extract of term
  | Union of term * term
  | Run of int
  | Print
  | Match of term
  | Clear
[@@deriving sexp]

let pp_action ppf a = Sexplib.Sexp.pp_hum ppf (sexp_of_action a)
let info = Format.printf

let run_action st = function
  | Assert t -> { st with egraph = TermMap.add t true_ st.egraph }
  | Rewrite (lhs, rhs) -> { st with rules = (lhs, rhs) :: st.rules }
  | Union (lhs, rhs) -> { st with egraph = union st.egraph lhs rhs }
  | Extract t ->
      info "%a" pp_term (reduce st.egraph t);
      st
  | Run _n ->
      let e = canon st.egraph in
      info "%a@n" pp_egraph e;
      let e =
        List.fold_left (fun e (lhs, rhs) -> apply_rule e lhs rhs) e st.rules
      in
      { st with egraph = e }
  | Print ->
      Format.printf "%a@n" pp_st st;
      st
  | Match p ->
      let matches = bottom_up st.egraph p in
      List.iter
        (fun (t, subst) -> Format.printf "%a: %a@n" pp_term t pp_subst subst)
        matches;
      st
  | Clear -> empty_state

Nelson Ofen or Shostak make the theory combination (modulo fist theory) into fists not unusual https://arxiv.org/abs/1207.3262 Except I haven’t seen a sufficient reference on ground compatible modulo theories. The rule of thumb I’m aware of from z3 is that you shouldn’t have explicit symbols in matching patterns because it doesn’t work. I’m not sure if lambda is “just another theory” or not.

Some intriguing theories:

Multisets (AC?)
sets
Linear (Gaussian elimination as a rewrite system)
Non-linear (Grubner)
Beta normalization?

Normalization by evaluation

What should interface theories present? User defined distributors in Z3. Give theories a new equality, they compensate for a new equality. Theories must hold opaque symbols. The safest conception of theories are those that I could imitate using rules but chose not to for convenience and speed. AC fits into this category

possible jazz interface:

type ctx
type t
compare
equal
solve : term -> ctx -> t
inform_eq : term -> term -> ctx -> (t * t) list * ctx



module type Theory = sig
  type 'a t
  type ctx
  val norm : ctx -> 'a t -> 'a t
  val compare : ctx -> ('a -> 'a -> int) -> 'a t -> 'a t -> int 

  val equal : ('a -> 'a -> bool) -> 'a t -> 'a t -> bool
  val reify : term -> t term 
  val reflect : term t -> term
end

Generic joining through a range lens

Discrimination attempts

suffix tries

Ground terms are even easier to search than strings. You only need to search in keyword locations instead of all locations. ##

Hash consing memorizing lattice size/data at nodes. On the side or at the intersection itself?

Pivoting – Can we change range arrangements midstream? is it a sound fruitful?

Go to ground, young man.

the connections

GRS is even more extreme in the “terms matter” direction that as I’m considering it, there are no more AIDS at all. Only conditions. It also helps a bit with context, as sub-terms in less flat terms can derive context from their parents.

Internal support for automatic zippers as a method for handling context.

Source

E-graphs are Ground Rewrite Systems 2: E-matching

Related Articles

How to Retrieve Records in Batches from a SQL Server Table

Bridging the Gap: The Power of Collaboration Between Marketing and Interns

How to Fix Angular PowеrShеll ng.ps1 Cannot Bе Loadеd and Not Digitally Signеd

How to Query Secondary Indexes Using DynamoDBContext From the .NET SDK?

What is a Heart Disease Hospice?