Prototype of a Scannerless, Generalized Left-to-right Rightmost (SGLR) derivation parser for JavaScript

Need a Scannerless, Generalized Left-to-right Rightmost (SGLR) derivation parser for JavaScript? Then my JSGLR-GWT proof-of-concept might interest you. (If you don’t know what a scannerless GLR parser is, then you are probably not interested in writing modular grammars for your programming language implementation anyway — or you might want to read a few sections down).

A word of warning: I’m not claiming it is written in JavaScript, merely for JavaScript. I simply took our old Java-based SGLR implementation and massaged it quite heavily until GWT could spit out a working version for the JavaScript interpreters.

Performance

The performance is, quite frankly, beyond my wildest expectations. I’ve not made a single attempt at performance tweaking for the JavaScript engines yet, and still I manage to parse through 830 lines of Stratego code in ~3 seconds (Chromium 7.0.519.0). For smaller fragments, even Firefox 3.5 manages to keep up: it manages ~20 lines in ~1s (Chromium does that test in 74ms).

To most people, this probably doesn’t sound very fast at all, but I was much worse off when I initially wrote JSGLR back in 2006 — and that used a term(tree) library that wasn’t hacked together over night. All in all, I think there are plenty of performance tweaking opportunities, and the JavaScript engines are getting faster almost by the minute.

Now, another reason for my optimism, is my use case. I’m interested only interested in JavaScript port of SGLR for interactive use. Think web IDE. My aim is to attain interactive speeds when editing text. A few optimization ideas off the top of my head:

  • Tweak the parser to only reparse the lines that have changed. This alone might be enough to survive with the current level of performance.
  • Reduce the extreme amount of garbage created currently by using object pools.
  • Do the initial parse on the server-side, at least for larger files.

What is a Scannerless GLR parser anyway?

(optional reading)

SGLR does not separate your parser into the usual two steps: (1) lexer/tokenizer/scanner followed by (2) LL- or LR-style parsing of the token stream. Instead, SGLR runs the parser on every character. This is why it’s scannerless.

The next point, GLR, is tricker to explain briefly. Remember that a parser takes a string of text and creates a (parse) tree for it. There might, at some stage during the parsing, be multiple possible trees that might fit the text seen so far. This is called an ambiguity. We don’t yet know what is meant by the text. Only by continue reading, will it (hopefully) become apparent. However, this is where the k comes in. LL(k)/LALR(k) lets you look only k tokens into the future. GLR allows you to look all tokens into the future (for SGLR, I should probably say all characters into the future.).

Typically, LL and LALR-parsers, such as the archaic Yacc and Bison parsers, force you to contort your grammar a lot to meet the k-token lookahead. If you have ever run into the dreaded “shift/reduce” conflict, you know what I’m talking about.

The combination of scannerless and GLR gives us something wonderful: grammars are closed under composition. So, if you write grammars for languages L1 and L2, you can just make a new module L3, include L1 and L2, et voila — you have L3 = L1 ∪ L2.

Trying such a stunt with LALR will almost invariably push you into the “shift/reduce”-conflict corner. Admittedly, there is a significant trade-off or gotcha with the (S)GLR approach. And it is very similar to the trade-offs/gotchas with early vs late binding, or static vs dynamic typing.

If, after processing the entire text, there still are multiple possible parse trees, GLR will give you all of them. So, you catch the ambiguity at runtime. But you can use GLR to say more: the set of grammars you can express with GLR is larger than that of LL(k) and LALR(k).

The trade-off is simple: with LL(k)/LALR(k), you are guaranteed to only have one, unambiguous, tree at the end — but the you might not be able to express the language you want to design. With SGLR, you are free to express the full class of context-free grammars, but the parsing result might be a forest of trees. It’s up to you to prune the forest afterwards, finding the tree you want.

I’m not claiming that SGLR is the best solution for all kinds for parsing needs. With great power comes great responsibility. You might not always need great power, and great responsibilty is usually just a liability, anyway;)

Implementation Details

To interested parties, this is what I did:

  • I ripped out the dependence on the CWI ATerm library and (yet again) wrote my own. Mostly hacked together based from scratch, but with the Textual ATerm Format (TAF) parser stolen from the BasicStrategoTerm implementation.
  • Ripped out all references to java.io and replaced with my own stuff, based on strings.
  • Wrote a tiny RemoteServiceServlet to allow the client side to access the parse tables from server side.
  • Added a tiny hack to also fetch parse tables as TAFs and load these on the client side.
  • Various other minor fixes all over the place.

Great things in the Eclipse JDT

Since we started with the KolibriFX project six months ago, I have programmed a lot of Scala and Java, plus some Stratego and JavaScript. I love programming in Scala and Stratego because my implementations are shorter and usually a lot easier to reason about. I often get this warm, fuzzy feeling of having captured something tricky in an elegant fashion.

However, in day-to-day programming, Java has one killer feature that makes up for a lot of cruft in the language: the Eclipse JDT. I recently spent a few days rewriting some of our Scala code to Java, and that really brought the point home. I find that, at least for our code-base, the JDT offers productivity benefits that far outweigh the linguistic disadvantage Java suffers compared to Scala.

The points that follow will be an old hat to programmers familiar with the JDT, though they might serve as a reminder that for all its flaws, the JDT is a rather great tool, after all.

Realtime Compilation

This is probably the biggest tool-provided productivity booster I know of. I’ve always been a huge fan of realtime compilation, because it encourages flow. It brings me into “the zone”, “the flow” or whatever you prefer to call it. Whenever I have to wait for the compiler for more than a few seconds, I lose my flow and switch to another desktop “to remain productive”. This invariably forces an expensive mental context switch, and ultimately reduces my productivity, but without reducing my perception of productivity — I’m still doing stuff, so I feel I’m moving forward, except I’m not.

Realtime compiler error messages and warnings, allow me to mindlessly work through the live updated list until it’s empty. At that point, I run my unit tests and continue flowing until they’re all green. I get to decide how quickly I work. The computer (nearly) always follows my pace, so I’m the performance bottleneck. It makes me feel in control. I easily enter the flow, and I stay there. One thing I use as a flow marker is how often I watch the time. With Scala, 5 minutes rarely go by before I check it, since I’m waiting for the compiler to churn through. With the JDT, an hour may pass, easily.

This is not a bash on Scala. Stratego has the same problem, but we’re working on an incremental compiler to improve the problem there. I predict that both Scala and Stratego will eventually have compilers working at interactive speeds, and the same degree of flow will be attained as with the JDT compiler.

Quick Fixes

Beyond realtime compilation, the JDT offers additional inspiration for up-and-coming languages: its refactorings and quick fixes make life with Java easy and nearly enjoyable. Based on my usage the last week, these are my favourite quick fixes (admittedly, this list changes from week to week, based on the nature of the programming tasks):

Assign to Field

class X {
   X(int y|) {
   }
 }

With the cursor placed after y, quick fix suggests the creation of a new field, private final int y, and also adds the line this.y = y to the constructor body.

Extract to Separate File

File X.java:

class X { ... }
class Y| { ... }

With the cursor after Y, quick fix suggests that the class Y be moved to a separate file, Y.java, and it deals with all the imports at the top of both X.java and Y.java.

Generate Getters and Setters

class X {
   private int x;
   private int y;
}

Creates getX/setX and getY/setY. You can control if you want both get and set on a per-variable basis.

Change Visibility to ‘x’.

File foo/X.java:

package foo;
class X {
   Horoscope computeHoroscope();
}

File bar/Y.java:

package bar;
class Y {
   ...
   x.computeHoroscope()|;
}

If you are in another package and invoke, x.computeHoroscope(), quick fix tells you that friendly access is not possible, and suggests adding public to the computeHoroscope() method declaration.

Add/remove Argument to Match.

File IntPair.java:

class IntPair {
   Pair(int x) {
   }
}

File Main.java:

IntPair x = new IntPair(10, 100)|;

The usage doesn’t match any of the known constructors of IntPair. Quick fix is going to suggest that another argument is added to IntPair#IntPair(int), so that it becomes IntPair#IntPair(int,int).

Change type of ‘x’ to match expression.

String currentTime = new Set<HashMap<String, Integer>>();

The type mismatch prompts quick fix to suggest replacing String with Set<HashMap<String, Integer>> for the variable currentTime. You may also sometimes get away with

currentTime| = new Set<HashMap<String, Integer>>();

and ask quick fix to create new local variable with the cursor placed near currentTime.

Infer generic arguments

Map map = new HashMap<String, String>()|;

Here, quick fix can compute the correct generic arguments for Map (<String, String>).

Import Type ‘x’.

new HashMap<String, String();

Quick fix may be used to quickly import java.util.HashMap. It allows you to stop fussing about the endless imports lists, and it actually works flawlessly 99% of the time (sometimes, it may insert the import in a funny place if your code is not parseable at the time you ask for an import).

Wrapping Up

Most of the above features are great when writing fresh code. However, they’re even better when you need to change code — especially the realtime compilation feature. Refactorings that may take hours in Stratego or Scala are usually done in minutes in Java, thanks to the IDE support. Imagine where we may go with that kind of IDE support for the newer languages…