Pattern matching over Ruby objects
I've been writing about recursive structures and how to check whether two objects are similar for a while and I finally got to the code I wanted: recursion-safe "pattern matching" over Ruby objects. This begins to feel more useful, and if anything, it looks better.
Examples
We need a small class to show how this sort of pattern matching works, something as simple as this will do:
class Node attr_accessor :a, :b attr_reader :value def initialize(value, a,b) @a, @b = a, b @value = value end end
Let's proceed to the first pattern: we will match any Node object and capture its instance variables (@value, @a and @b):
o = lambda{ Object.new } pattern = Pattern.new do |p| Node.new(p.capture(:thevalue){ o[] }, p.capture(:left){ o[] }, p.capture(:right){ o[] }) end md = pattern.match Node.new("some value", nil, Node.new("something else", nil, nil)) md[:thevalue] # => "some value" md[:left] # => nil md[:right] # => #<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil> #regexpish md[0] # => #<Node:0xb7ddaeac @value="some value", @b=#<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil>, @a=nil> md[1] # => "some value" md[2] # => nil md[3] # => #<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil> a, b, c = md.captures a # => "some value" b # => nil c # => #<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil>
I can specify which values must be captured with
p.capture(:capture_name){ value to be matched and captured }
References to named captures
It's been really easy so far. Time to move to something a bit more interesting; I'll add a number of constraints:
- the value held by the node must be a String
- no left-child (nil)
- the right-child must be a node with no children and with the same value as the parent
The last condition can be specified with a reference to a previous capture, e.g.
Pattern.new{|p| [p.capture(:val){ Object.new }, Object.new, p.capture(:val)] }
matches all arrays such that the first element is the same as the last. So let's do it:
pattern = Pattern.new do |p| Node.new(p.capture(:value){""}, nil, Node.new(p.capture(:value), nil, nil)) end # matches a Node with a String value and whose 'right-child' is a node with the # very same value and no children md = pattern.match Node.new("some value", nil, Node.new("some value", nil, nil)) # different values (in the object_id sense), no match md # => nil s = "whatever" md = pattern.match Node.new(s, nil, Node.new(s, nil, nil)) md[:value] # => "whatever"
Recursive captures
No way I could forget recursive stuff, right? Here's another pattern, matching a node whose value is an array with three elements of classes String, Fixnum and String (or derived, as usual), and referring to itself:
pattern = Pattern.new do |p| p.capture(:all){ Node.new([p.capture(:l){ "" }, p.capture(:middle){ 1 }, ""], nil, p.capture(:all)) } end n = Node.new(["foo", 2, "whatever"], nil, Node.new(nil, nil, nil)) pattern.match(n) # => nil n.b = n md = pattern.match(n) md[:l] # => "foo" md[:middle] # => 2
The code
It's based on the code I used to test whether two objects are "structurally equivalent". By creating special Capture objects with suitable #struct_eql? methods, I can record the matches while #struct_eql? proceeds.
# "Pattern matching over Ruby objects" # Copyright (C) 2005 Mauricio Fernandez <mfp@acm.org> # Code subject to the terms of the Ruby license. class Object def struct_object_id object_id end def immediate? [FalseClass, TrueClass, NilClass, Fixnum, Symbol].any?{|x| x === self} end def immediate_class self.class end end class Array def struct_eql?(o, counter = [0], my_rec = {}, his_rec = {}, helper = {}) return true if o.struct_object_id == struct_object_id return false unless self.class === o return false if o.size != size counter[0] += 1 my_rec[struct_object_id] = his_rec[o.struct_object_id] = counter[0] size.times do |i| x, y = self[i], o[i] if [x,y].any?{|object| object.immediate? } return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper) next end xid, yid = x.struct_object_id, y.struct_object_id if Array === x and Array === y if my_rec[xid] || his_rec[yid] if my_rec[xid] != his_rec[yid] return false else # equal next end else return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper) end elsif Array === x # y is not an array... return false else # general case return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper) end end true end end class Object def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {}) return true if other.struct_object_id == struct_object_id return false unless self.class === other return true if immediate? return false unless instance_variables.sort == other.instance_variables.sort counter[0] += 1 my_rec[struct_object_id] = his_rec[other.struct_object_id] = counter[0] instance_variables.map do |iv| x, y = [self, other].map{|obj| obj.instance_variable_get(iv)} xid, yid = [x,y].map{|obj| obj.struct_object_id } xtag, ytag = my_rec[xid], his_rec[yid] if xtag || ytag if xtag == ytag next else return false end end counter[0] += 1 my_rec[xid] = his_rec[yid] = counter[0] return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper) end true end end # Very little checking atm., left for later class Hash def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {}) return false unless self.class === other return false unless self.size == other.size true end end class Pattern Capture = Struct.new(:model) do attr_reader :value def struct_object_id; model.object_id end def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {}) unless [TrueClass, FalseClass].any?{|x| x == self.model.immediate_class} xid, yid = [self.model, other].map{|o| o.struct_object_id} xtag, ytag = my_rec[xid], his_rec[yid] if xtag || ytag if xtag == ytag @value = other return true else return false end end counter[0] += 1 my_rec[model.struct_object_id] = his_rec[other.struct_object_id] = counter[0] end @value = other model.struct_eql?(other, counter, my_rec, his_rec, helper) end end class BooleanMatcher @id = "BooleanMatcher_0" class << self; def new_id; @id = @id.succ end end def initialize; @id = self.class.new_id; end def immediate?; true end def immediate_class TrueClass end def struct_object_id; @id end def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {}) return false unless [FalseClass, TrueClass].any?{|x| x === other} unless helper.has_key?(struct_object_id) helper[struct_object_id] = other end return false if helper[struct_object_id] != other true end end def BOOLEAN; BooleanMatcher.new end MatchedCapture = Struct.new(:name, :index, :value) class MatchData attr_reader :captures, :value def initialize(value, captures) @name_hash = {} @idx_hash = {} @captures = captures.clone.sort_by{|c| c.index}.map{|c| c.value} captures.each do |capture| @name_hash[capture.name] = @idx_hash[capture.index] = capture end @value = value end def [](idx_or_name) return @value if idx_or_name == 0 if @idx_hash.has_key? idx_or_name @idx_hash[idx_or_name].value elsif @name_hash.has_key? idx_or_name @name_hash[idx_or_name].value else nil end end end def initialize @captures = {} @pattern = yield self end def capture(name, &block) raise ArgumentError, "Repeated capture" if @captures.has_key?(name) && block_given? if @captures.has_key?(name) @captures[name].last else capture = create_capture(nil) @captures[name] = [@captures.size + 1, capture] capture.model = yield capture end end def match(other) r = @pattern.struct_eql?(other) if r matched_captures = @captures.map do |name, (idx, capture)| MatchedCapture.new(name, idx, capture.value) end return MatchData.new(other, matched_captures) else nil end end private def create_capture(model) Capture.new(model) end end
Related work
Reg
The first thing I was reminded of when writing the above was Caleb Clausen's Reg:
Reg is a library for pattern matching in ruby data structures. Reg provides Regexp-like match and match-and-replace for all data structures (particularly Arrays, Objects, and Hashes), not just Strings.
It is more powerful than the pattern matching I presented in the sense that it can handle regular languages: my code is missing alternation and Kleene's closure. I'd need a backtracking engine to handle them.
On the other hand, it doesn't do backreferences yet, which the above code handles just fine in a bit over 200 lines, and I don't know how it would behave with recursive data structures.
types.rb
Another somewhat related thing would be Eivind Eklund's types.rb. That was meant to be used in a very different way though:
This is an implementation of type checking for Ruby. The implementation is fairly powerful, and allows the definition of types more or less arbitrarily. There is just one tiny problem with this: I've found it to only get in the way in practice.
Using types.rb, one could specify the valid argument types for a method. The author himself found it nocive enough to warn potential users:
If you want to play with adding extra type checking to Ruby, I recommend this module. However, I do not recommend adding extra type checking (at least not based on strict checking of method parameters.)
However, I wonder if a different use of that (testing) could prove useful. Maybe the vocabulary of the sort of "pattern language"*1 I created could be extended using types.rb's assertions to create more useful patterns.
Further examples
Here are some other examples, just to clarify the semantics, especially regarding immediate values such as true/false.
pattern = Pattern.new{|p| [true, "", true]} pattern.match([true, "", true]).nil? # => false pattern.match([false, "", false]).nil? # => true pattern = Pattern.new{|p| [1, "", 1]} pattern.match([1, "", 1]).nil? # => false pattern.match([2, "", 2]).nil? # => false pattern = Pattern.new{|p| [p.capture(:bool){p.BOOLEAN}, "", true, p.capture(:bool)] } pattern.match([false, "foo", true, false]).nil? # => false pattern.match([false, "foo", false, false]).nil? # => true pattern.match([false, "foo", true, true]).nil? # => true pattern.match([true, "foo", true, true]).nil? # => false pattern.match([false, "foo", true]).nil? # => true K1 = Class.new do def initialize(a,b); @a, @b = a, b end end pattern = Pattern.new do |p| p.capture(:all){ K1.new(p.capture(:internal){Object.new}, K1.new(p.capture(:internal), :b)) } end md = pattern.match K1.new(:a, K1.new(:a,:b)) md[:internal] # => :a md[:all] # => #<K1:0xb7dd621c @b=#<K1:0xb7dd6244 @b=:b, @a=:a>, @a=:a> md = pattern.match K1.new(:a, K1.new(:b, :a)) md.nil? # => true pattern = Pattern.new do |p| p.capture(:all){ K1.new(1, K1.new(K1.new(nil, 1), :b)) } end pattern.match( K1.new(1, K1.new(K1.new(nil, 2), :a)) ).nil? # => true pattern.match( K1.new(1, K1.new(K1.new(nil, 1), :a)) ).nil? # => false Pattern.new{ [1,2,3,1] }.match [3, 5, 6] # => nil md = Pattern.new do |p| p.capture(:all){ [1,2,p.capture(:inside){[1,2]} ] } end.match [2,3,[2,3]] md[1] # => [2, 3, [2, 3]] md[2] # => [2, 3] md.captures X = Struct.new(:foo) pattern = Pattern.new do |p| [1, p.capture(:name){ "" }, {1,1}, p.capture(:value){ X.new(1) } ] end md = pattern.match([3, "this is my name", {1,2}, X.new(2)]) md[1] # => "this is my name" md[2] # => #<struct X foo=2> rec_array_pattern = Pattern.new do |p| p.capture(:array){ [1, p.capture(:array), 1 ] } end rec_array_pattern.match([1, 2, 3]) # => nil rec_array_pattern.match([1, [], 3]) # => nil a = [] a << 1 << a << 1 a # => [1, [...], 1] md = rec_array_pattern.match(a) md[1] # => [1, [...], 1] md[:array] # => [1, [...], 1] b = []; b << b rec_array_pattern.match([1,b,3]) # => nil rec_array_pattern2 = Pattern.new do |p| p.capture(:array){ [1, p.capture(:array), p.capture(:last){1} ] } end md = rec_array_pattern2.match(a) md[:last] # => 1 md[:array] # => [1, [...], 1] class Node2 attr_accessor :children, :value def initialize(value, *children) @value = value @children = children end def <<(x) @children << x end end dg_pattern = Pattern.new do |p| p.capture(:toplevel) do Node2.new(p.capture(:name){""}, Node2.new(1, p.capture(:toplevel)) ) end end a = Node2.new("foo") a << Node2.new(2, a) md = dg_pattern.match(a) md[:name] # => "foo" md[:toplevel] # => #<Node2:0xb7dd091c @children=[#<Node2:0xb7dd0908 @children=[#<Node2:0xb7dd091c ...>], @value=2>], @value="foo"> md = dg_pattern.match(Node2.new("foo", Node2.new(2))) # => nil
杭州千斤顶 2006-07-29 (Sat) 01:16:30
时尚充气千斤顶使用方式打破了常规的液压式千斤顶烦琐,我们是生产千斤顶的企业。
*1 currently limited to p.BOOLEAN and p.capture
Keyword(s):[blog] [ruby] [pattern] [matching] [recursive]
References:[Ruby]