MuPar is a simple parse graph for DSLs and NLP
with the following features
graph based intermediate representation
allow runtime closures to extend lexicon
allow imprecise searching
allow short term memory (STM)
Here is the ubiquitous Hello World
greeting β "hello" "world"
namespace { }
brackets limits the symbols hello
and world
to greeting
.
greeting β hello world {
hello β "hello"
world β "world"
}
double quotes match strings, while single quotes match regular expressions:
year β '(19|20)[0-9][0-9]'
digits β '[0-9]{1, 5}'
Alternation and repetitions are supported
greetings β cough{, 3} (hello | yo+) (big | beautiful)* world?
in the file test.par is the line
events β 'event' eventList()
whereupon the source in TestNLP+test.swift, attaches to eventList()
root?.setMatch("test show event eventList()", eventListChecker)
and attaches a simple callback to extend the lexicon:
func eventListChecker(_ str: Substring) -> String? {
let ret = str.hasPrefix("yo") ? "yo" : nil
return ret
}
which in the real world could attach to a dynamic calendar, or any other 3rd party API.
Here is the output from ParTests/TestNLP+Test.swift :
βΉ before attaching eventListChecker() - `yo` is unknown
"test show event yo" βΉ π« failed
βΉ runtime is attaching eventListChecker() callback to eventList()
"test show event eventList()" βΉ eventList.924 = (Function)
βΉ now `yo` is now matched during runtime
"test show event yo" βΉ test: 0 show: 0 event: 0 yo: 0 βΉ hops: 0 βοΈ
For NLP, word order may not perfectly match parse tree order. So, report number of hops (or Hamming Distance) from ideal.
Output from ParTests/TestNLP+Test.swift:
"test event show yo" βΉ test: 0 show: 1 event: 0 yo: 1 βΉ hops: 2 βοΈ
"yo test show event" βΉ test: 1 show: 1 event: 2 yo: 2 βΉ hops: 6 βοΈ
"test show yo event" βΉ test: 0 show: 0 event: 1 yo: 0 βΉ hops: 1 βοΈ
"test event yo show" βΉ test: 0 show: 2 event: 0 yo: 0 βΉ hops: 2 βοΈ
For NLP, set a time where words from a previous query continue onto the next query.
Output from ParTests/TestNLP+Test.swift:
βΉ with no shortTermMemory, partial matches fail
"test show event yo" βΉ test: 0 show: 0 event: 0 yo: 0 βΉ hops: 0 βοΈ
"test hide yo" βΉ π« failed
"test hide event" βΉ π« failed
"hide event" βΉ π« failed
"hide" βΉ π« failed
βΉ after setting ParRecents.shortTermMemory = 8 seconds
"test show event yo" βΉ test: 0 show: 0 event: 0 yo: 0 βΉ hops: 0 βοΈ
"test hide yo" βΉ test: 0 show: 10 event: 10 yo: 0 βΉ hops: 20 βοΈ
"test hide event" βΉ test: 0 show: 10 event: 1 yo: 9 βΉ hops: 20 βοΈ
"hide event" βΉ test: 10 show: 9 event: 0 yo: 8 βΉ hops: 27 βοΈ
"hide" βΉ test: 9 show: 8 event: 8 yo: 9 βΉ hops: 34 βοΈ
Here is the Par definition in the Par format:
par β name "β" right+ sub? end_ {
name β '^[A-Za-z_]\w*'
right β or_ | and_ | paren {
or_ β and_ orAnd+ {
orAnd β "|" and_
}
and_ β leaf reps? {
leaf β match | path | quote | regex {
match β '^([A-Za-z_]\w*)\(\)'
path β '^[A-Za-z_][A-Za-z0-9_.]*'
quote β '^\"([^\"]*)\"' // skip \"
regex β '^([i_]*\'[^\']+)'
}
}
parens β "(" right ")" reps
}
sub β "{" end_ par "}" end_?
end_ β '[ \\n\\t,]*'
reps β '^([\~]?([\?\+\*]|\{],]?\d+[,]?\d*\})[\~]?)'
}
Here is a complete Par definition for the functional data flow graph, called Flo:
flo β left right* {
left β (path | name)
right β (hash | time | value | child | many | copyat | array | edges | embed | comment)+
hash β "#" num
time β "~" num
child β "{" comment* flo+ "}" | "." flo+
many β "." "{" flo+ "}"
array β "[" thru "]"
copyat β "@" (path | name) ("," (path | name))*
value β scalar | exprs
value1 β scalar1 | exprs
scalar β "(" scalar1 ")"
scalars β "(" scalar1 ("," scalar1)* ")"
scalar1 β (thru | modu | data | num) {
thru β num ("..." | "β¦") num dflt? now?
modu β "%" num dflt? now?
index β "[" (name | num) "]"
data β "*"
dflt β "=" num
now β ":" num
}
exprs β "(" expr+ ("," expr+)* ")" {
expr β (exprOp | name | scalars | scalar1 | quote)
exprOp β '^(<=|>=|==|<|>|\*|_\/|\/|\%|\:|in|\,)|(\+)|(\-)[ ]'
}
edges β edgeOp (edgePar | exprs | edgeItem) comment* {
edgeOp β '^([<β][<[email protected]ββ‘ββ>]+|[[email protected]ββ‘ββ>]+[>β])'
edgePar β "(" edgeItem+ ")" edges?
edgeItem β (edgeVal | ternary) comment*
edgeVal β (path | name) (edges+ | value)?
ternary β "(" tern ")" | tern {
tern β ternIf ternThen ternElse? ternRadio?
ternIf β (path | name) ternCompare?
ternThen β "?" (ternary | path | name | value1)
ternElse β ":" (ternary | path | name | value1)
ternCompare β compare (path | name | value1)
ternRadio β "|" ternary
}
}
path β '^(([A-Za-z_][A-Za-z0-9_]*)?[.ΒΊΛ*]+[A-Za-z0-9_.ΒΊΛ*]*)'
name β '^([A-Za-z_][A-Za-z0-9_]*)'
quote β '^\"([^\"]*)\"'
num β '^([+-]*([0-9]+[.][0-9]+|[.][0-9]+|[0-9]+[.](https://raw.github.com/musesum/MuPar/main/?![.])|[0-9]+)([e][+-][0-9]+)?)'
comment β '^([,]+|^[/]{2,}[ ]*(.*?)[\n\r\t]+|\/[*]+.*?\*\/)'
compare β '^[<>!=][=]?'
embed β '^[{][{](https://raw.github.com/musesum/MuPar/main/?s)(.*?)[}][}]'
}
"""#
Par is vertically integrated with Flo here
Bottom up restructuring of parse from user queries
link |
Stars: 0 |
Last commit: 2 weeks ago |
Swiftpack is being maintained by Petr Pavlik | @ptrpavlik | @swiftpackco | API | Analytics