A Reflection on VBA Reflection

The idea has always been floating around, and a number of feature requests had been made to support something like it.

Not to mention all the Stack Overflow questions asking how to iterate the members of a module, with answers explaining how to use the VBIDE API to achieve some level of “reflection”.

The VBIDE API works: it gives you the members of a module rather easily. But if you need anything more granular, like iterating the local variables in a given member, you’ll have to write code to manually parse that member’s code, and if you’re trying to programmatically access all references to a global variable, you’re in for lots of “fun”.

Truth is, the VBIDE API is bare-bones, and if you want to do anything remotely sophisticated with it, you basically need to write your own parser.

Rubberduck 2.0 will expose [some of] its guts to VBA clients, through COM reflection – using the ParserState API, you’ll be able to write code like this:

api1

..and leverage the richness of Rubberduck’s own API to iterate not only module members, but any declaration Rubberduck is aware of – that includes everything from modules and their members, down to variables, constants, events, Declare statements, and even line labels. You can also iterate the references of any declaration, so in theory you could implement your own code inspections with VBA and, in conjunction with the VBIDE API, you could even implement your own refactorings

This API can easily grow TONS of features, so the initial 2.0 release will only include a minimalist, basic functionality.

One can easily imagine that this API will eventually enable surgically parsing a given specific class module, and then iterating its public properties to create form controls for each one, e.g. a Boolean property maps to a checkbox, a String property maps to a textbox, etc. – takes dynamic UI generation to a whole new level.

What will you do with that power?

IDE-Integrated Git Source Control

It was merged just yesterday. When Rubberduck 2.0 is released, VBA devs will have a new tool in their arsenal: full-fledged Git source control, seamlessly integrated into their IDE, in a dockable toolwindow:

sc-panel.PNG

Why use Source Control?

How many of you keep version and change tracking information in comments? Does this look any familiar?


'modified 2014-04-27, ticket #173

Or this perhaps?

'version 1.2
' - added CalculateRangePrices macro
' - fixed bug in SalesByCustomerReport

These things don’t belong in code, they belong in your commit history. Using source control gives you that, so you can have code files that contain, well, code.

Collaborative work in VBA is pretty annoying anyway – you have to manually merge changes, and import/export modules manually, if you’re not outright copy/pasting code. This is error-prone and, let’s face it, nobody does it.

With Rubberduck and your GitHub repository two clicks away, you can now work on your code even if you don’t have the macro-enabled workbook with you right now. Because Rubberduck doesn’t just dump the .xlsm file into a GitHub repo – it actually exports each individual code file (yes, including workbooks and worksheets and UserForms) into an actual working directory, so you can work in VBA just like you would in VB6.. or C#, or Java.

You can create a branch, and work on a feature while shielding your “master” branch from these changes, issue a bug-fix on “master”, merge the bug-fix into your dev branch, finalize your work, then merge your dev branch into master, and release a new version: that’s how devs work, and that’s how VBA devs can work too, even if they’re a lone wolf.

Source Control integration was issue #50 in Rubberduck’s repository (we’re now at #1219, some 3,000 commits later) – we wanted this feature all along. And we’re delivering it next release, promise.

Special thanks to @Hosch250, who worked astonishingly hard to make this happen.

See how it works on our wiki (yes, that’s work in progress).

 

v2.0: 75% there

At least according to our issues list.

I’ve been working on quite a lot of things these past few weeks, things that open up new horizons and help stabilize the foundations – the identifier reference resolver.

Precompiler Directives

And we’ve had major contributions too, from @autoboosh: Rubberduck’s parser now literally interprets precompiler directives, which leaves only the “live” code path for the parsing and resolving to work with. This means you can scratch what I wrote in an earlier post about this snippet:

[…] Not only that, but a limitation of the grammar makes it so that whatever we do, we’ll never be able to parse the [horrible, awfully evil] below code correctly, because that #If block is interfering with another parser rule:

Private Type MyType 
  #If DEBUG_ Then
    MyMember As Long
  #Else
    MyMember As Integer
  #End If
End Type

And this (thoroughly evil) one too:

  #If DEBUG_ Then
    Sub DoSomething()
  #Else
    Sub DoSomethingElse()
  #End If
      '...
    End Sub

Rubberduck 2.0 will have no problem with those… as long as DEBUG_ is defined with a #Const statement – the only thing that’s breaking it is project-wide precompiler constants, which don’t seem to be accessible anywhere from the VBIDE API. Future versions might try to hook up the project properties window and read these constants from there though.

Isn’t that great news?

But wait, there’s more.


Project References

One of the requirements for our resolver to correctly identify which declaration an identifier might be referring to, is to know what the declarations are. Take this code:

  Dim conn As New ADODB.Connection
  conn.ConnectionString = "..."
  conn.Open

Until I merged my work last night, the declarations for “ADODB” and “Connection” were literally hard-coded. If your code was referencing, say, the Scripting Runtime library to use a Dictionary object, the resolver had no way of knowing about it, and simply ignored it. The amount of work involved to hard-code all declarations for the “most common” referenced libraries was ridiculously daunting and error-prone. And if you were referencing a less common library, it was “just too bad”. Oh, and we only had (parts of) the Microsoft Excel object model in there; if you were working in Access or Word, or AutoCAD, or any other host application that exposes an object model, the resolver simply treated all references to these types and their members, as undeclared identifiers – i.e. they were completely ignored.

I deleted these hard-coded declarations. Every single one of them. Oh and it felt great!

Instead, Rubberduck will now load the referenced type libraries, and perform some black magic COM reflection to discover all the types and their members, and create a Declaration object for the resolver to work with.

This enables things like code inspections specific to a given type library, that only run when that library is referenced:

ht6vq

It also enables locating and browsing through references of not only your code, but also built-in declarations:

7wp4x

Doing this highlighted a major performance issue with the resolver: all of a sudden, there was 50,000 declarations to iterate whenever we were looking for an identifier – and the resolver did an awful job at being efficient there. So I changed that, and now most identifier lookups are O(1), which means the resolver now completes in a fraction of the time it took before.

There’s still lots of room for improvement with the resolver. I started to put it under test – the unit tests for it are surprisingly easy to write, so there’s no excuse anymore; with that done, we’ll know we’re not breaking anything when we start refactoring it.

One of the limitations of v1.x was that project references were being ignored. Well, the resolver is now equipped to start dealing with those – that’s definitely the next step here.


Module/Member Attributes

Another limitation of v1.x was that we couldn’t see module attributes, so if a class had a PredeclaredId, we had no way of knowing – so the resolver essentially treated classes and standard modules the same, to avoid problems.

Well, not anymore. The first time we process a VBComponent, we’re silently exporting it to a temporary file, and give the text file to a specialized parser that’s responsible for picking up module and member attributes – then we give these attributes to our “declarations” tree walker, which creates the declaration objects. As a result, we now know that a UserForm module has a PredeclaredId. And if you have a VB_Description attribute for each member, we pick it up – the Code Explorer will even be able to display it as a tooltip!


What about multithreading?

I tried hard. Very hard. I have a commit history to prove it. Perhaps I didn’t try hard enough though. But the parser state simply isn’t thread-safe, and with all the different components listening for parser state events (e.g. StateChanged, which triggers the code inspections to run in the background and the code and todo explorers to refresh), I wasn’t able to get the parser to run a thread-per-module and work in a reliable way.

Add to that, that we intend to make async parsing be triggered by a keyhook that detects keypresses in a code pane, parsing on multiple threads and getting all threads to agree on the current parser state is a notch above what I can achieve all by myself.

So unless a contributor wants to step in and fix this, Rubberduck 2.0 will still be processing the modules sequentially – the difference with 1.x is the tremendous resolver performance improvements, and the fact that we’re no longer blocking the UI thread, so you can continue to browse (and modify!) the code while Rubberduck is working.

What’s left to do for this to work well, now that the parsing and resolving is stabilized, is to allow graceful cancellation of the async task – because if you modify code that’s being parsed or resolved, the parser state is stale before the task even completes.

To be continued…

(hint: IDE-integrated source control)