Parsing is hard!
Rubberduck parsing is being refined again. The first few releases used a hand-made, regex-based parser. That was not only unmaintainable and crippled with bugs, it also had fairly limited capabilities – sure we could locate procedures, variables, parameters… but the task was daunting and, honestly, unachievable.
Version 1.2 changed everything. We ditched the regex solution and introduced a dependency on ANTLR4 for C#, using an admittedly buggy grammar intended for parsing VB6 code. Still, the generated lexer/parser was well beyond anything we could have done with regular expressions, and we could implement code inspections that wouldn’t have been possible otherwise. That was awesome… but then new flaws emerged as we realized we needed more than that.
Version 1.3 will change everything again. Oh, we’re still using ANTLR and a (now modified) buggy grammar definition, but then we’re no longer going to be walking the parse tree all the time – we’re walking it once to collect all declarations, and then once to collect all usages of every single identifier we can find.
What does that mean?
Imagine a table where, for every declaration, you lay down information such as the code module where you found it, the exact location in the code file, the scope it’s in, whether it’s a class module, a standard module, a procedure, a function, a property get/let/setter, an event, a parameter, a local variable, a global constant, etc. – and then whether it has a return/declared type, what that type is, whether it’s assigned a reference upon declaration (“As New”) – basically, on that table lays the entire code’s wireframe.
Then imagine that, for everything on that table, you’re able to locate every single reference to that identifier, whether it’s an assignment, and exactly where that reference is located – at which position in which code file and in which scope.
The result is the ability to make the difference between an array index assignment and a function call. If VBA can figure out the code, so can Rubberduck.
This means the next few versions will start seeing very cool features such as…
- Find all references – from a context menu in the Code Explorer tree view, or heck, why not from a context menu in the code pane itself!
- GoTo implementations – easily navigate to all classes that implement a specific interface you’ve written.
- GoTo anything – type an identifier name in a box, and instantly navigate to its declaration… wherever that is.
- Safe delete – remove an identifier (or module/class) only if it’s not used anywhere; and if it’s used somewhere, navigate to that usage and fix it before breaking the code.
- Rename refactoring – rename foo to bar, and only the “foo” you mean to rename; this isn’t a Find & Replace, not even a RegEx search – it’s a full-fledged “rename” refactoring à la Visual Studio!
Not to mention everything else that’s in store for v1.3 (did I say IDE-Integrated GitHub Source Control?) – stay tuned!
3 thoughts on “Rubberduck 1.3: When the VBE becomes a real IDE”
[…] project is coming along pretty nicely, and for the next version the parsing strategy is getting yet another revision, and this time around it really feels like it’s done […]
Now that would be very very useful when done!
Next you can use ANTLR lexer modes and a simple expression evaluator to filter preprocessor (nested) #if/then/else conditionals.
LikeLiked by 1 person
I really need to look into what else ANTLR can do for us. Feels like we’re only using it to a fraction of its potential… but I have no idea where to start!