Rubberduck 3.0: January Update

I intended to write about Rubberduck 3.0 progress last December, but things snowballed during the Holidays and here we are two-three weeks later and wow, time flies! Happy New Year dear readers (belatedly, I guess), 2023 is full of promises, and there are very nice things going on that I need to take a moment and share here.

Without any further ado, let’s clear the big news.

3 interlocked gears. Gear 1 is the largest, is labelled "add-in client", and drives gear 2 which is smaller and labelled "LSP server". Gear 2 drives gear 3, another smaller gear labelled "LocalDb server".

The main issues with Rubberduck have always been:

  • Memory consumption: Rubberduck consumes a lot of memory in the host process.
  • Instabilities related to COM interop: various tear-down issues with Office CommandBar and dockable toolwindows.
  • Poor VBIDE extensibility tooling and editor interactions.
  • Logs are difficult to use, it’s not clear what is happening in response to what – even when there’s only a single instance writing to the logs. Adding more logging means making things worse.

With v3 we’re addressing these long-standing issues by taking a number of design decisions early in the development process. These decisions were weighted against their downsides and alternatives, and probably make Rubberduck the first VBIDE add-in to implement a LSP Server for its purposes.

Language Server Protocol

For a while there have been discussions among Rubberduck devs about whether implementing LSP would be a feasible thing to do. It’s a protocol that formalizes all communications between a client (an IDE) and a language server that is used in modern IDEs such as Visual Studio and VSCode; twinBASIC implements it, and Rubberduck 3.0 will implement it too.

By moving all of the language-processing aspects out-of-process into a language server, we immediately tackle memory consumption issues: most of the CPU and memory resources Rubberduck 3.0 will use, are going to be outside of the add-in/host process.

With LSP in place, Rubberduck’s objective to bring editing VBA code in the Visual Basic Editor into the 21st century feels closer than ever.

SQLite

Rubberduck’s LSP implementation will be split in two processes, as the LSP server process will be a client for another server process that will host a SQLite database. SQLite is a lightweight library many applications on many platforms (including mobile!) use to persist data between sessions. The database is a local .db file, and the database engine runs in-process. Rubberduck 3.0 will host a SQLite instance in its own server process, and the LSP server process will communicate with it through JSON-RPC, the same way the add-in communicates with the LSP server.

Instead of keeping hundreds of thousands of objects in memory for quick lookups, Rubberduck will write these objects to the database, and only fetch what it needs to work, which should tremendously help reduce the memory and processing footprint of the add-in host process. Using it as a log target (instead of text files) could reduce in-process disk I/O… and replace it with socket I/O and work happening out-of-process.


Cross-Process Communication

The add-in project has no reference to the server project in the Rubberduck solution, and the calls aren’t late-bound either. What’s happening here is different, and there are implications: Remote Procedure Call (RPC) communications occur through web sockets (WS), using a port between 1024 and 5000. As a result, we need to have Windows Defender Firewall open that port for us:

A screenshot of the moment I knew the socket server worked.

Since everything is local, the port only needs private networks permission to operate. We use JsonRPC to send data through that port, so we’re streaming the bytes of human-readable, plain text JSON.

This new client/server architecture enforces a much more decoupled and robust solution.


Telemetry

Telemetry is considered a potentially controversial feature: it will be completely disabled by default and will have to be selectively opted-in explicitly, but with everything becoming asynchronous, trace logging alone often does not suffice for troubleshooting. By implementing a proper telemetry model, we’re giving ourselves the tools to track a request and all actions that stem from it, across the multiple processes.

Since the project started, the only usage data we ever had was our own biased anecdotal usage: we haven’t the slightest idea of what features are under-used, what features are clearly everyone’s favorites, what inspections are most commonly fired, what inspections are disabled, whether inspections we release disabled by default are ever enabled, etc.

Whether enabled or not, Rubberduck 3.0 will collect detailed telemetry data, and store it locally in the SQLite database, by default clearing any existing data on startup: vital debugging information is present if it’s needed.

Ok I’m opting-in, what gives?

Opting into telemetry will allow a Rubberduck client to automatically upload the telemetry data to a future endpoint on api.rubberduckvba.com (via https), where it will be persisted to a SQL Server database schema. Since there is no need for us to track any users, while still potentially extremely detailed, all telemetry data will be anonymous and impossible to track back to any particular user, computer, organization, or country. The transmitted telemetry data will only ever contain information that was explicitly allowed to be transmitted.

Time will tell how aggregated telemetry data can be used, but with enough data we (that includes you) could gain valuable insights on various points of interest:

  • Rubberduck feature usage statistics
  • LSP performance monitoring and troubleshooting
  • VBA language usage statistics, common issues

By transmitting some or all of your telemetry data, you’ll be helping make Rubberduck better for everyone, just by using it. However should you decide to not opt into it, we understand and respect your decision. Note that TraceTelemetry items are the trace logs, so transmitting them is exactly like sending us your log file for troubleshooting. I’ll make a separate post with all the details around pre-release time, and these features will be exhaustively documented on the website.


Progress?

Having the LSP and Telemetry models is one thing, actually implementing them is another. Last time I said I was going to be focusing primarily on the Rubberduck Editor UI, and I did for a while: the editor was progressing very well and I was making very conclusive tests with an in-process parser when I took the decision to move the parser out-of-process.

I proceeded to read the entire LSP specification and implemented a model for it. Shortly after, I realized that we were potentially going to be running multiple instances of a LSP server at once, and it dawned on me that having as many instances of the SQLite database loaded in memory was not going to be globally efficient… so I decided to pull the SQLite database into its own dedicated server process.

The whole exercise demanded a lot of movement in solution projects and namespaces, but I’m very happy with the results: everything is in its place, and the actual add-in project is pretty much empty!

I started with the server implementation that’s the furthest from the add-in: the SQLite database server. This server speaks to LSP through JSON-RPC, but while Language Server Protocol formalizes how the add-in and the LSP talk to each other, I don’t have such a formal protocol for communications between the LSP and the database… so I’m basing most of it on what I learned with LSP.

How it’s going to work: you start Excel and hit Alt+F11 to bring up the VBE. The Rubberduck add-in gets loaded and starts up, then starts a LSP server process and initializes it. In turn the LSP server starts, and attempts to locate the database server. If the database process isn’t found, the LSP server starts one. The Excel/VBE/Rubberduck client process owns the LSP server process, but nobody owns the database server: when the database has disconnected its last client, it automatically shuts down.

The servers (both database and LSP) are console applications that run silently as background processes. In order to facilitate configuring them, and viewing/reviewing their respective inputs and outputs, I’ve written a small client console application that shows the server console content, lets you easily export it to text files or copy it to the clipboard, etc.

a screenshot showing the Rubberduck.DataServer client console application in the middle of exporting a log output to a .log text file.
Screenshot from before the DataServer UI was moved into its own LocalDbClient project.

The LSP client console application will have an additional Telemetry tab to review, delete, and manually submit telemetry data. Server log trace can be set to verbose or turned off, and the server itself can be instructed to shut down, directly from this application.

When RD3 releases, these client console applications will probably be accessible from an add-in menu, or perhaps they’ll be started together with the add-in and minimized to the system tray… we’ll cross that bridge when we get to the river.

Meanwhile work on the editor itself has taken a backseat, since it wasn’t useful to work on parameter info tooltips and wire up add-in functionality that would have to be later undone to work through the LSP server. All of the proof-of-concept stuff that worked, is still working. It just needs to be wired up to work with LSP requests and notifications, so focus has now shifted to the language server and its database backend.

The next few weeks/months are going to be all about implementing the LSP server, most likely.

Rubberduck.Fakes Gets an Upgrade

One of the objectively coolest features in Rubberduck is the Fakes API. Code that pops a MsgBox for example, needs a way to work without actually popping that message box, otherwise that code cannot be unit tested… without somehow hijacking the MsgBox function. The Fakes API does exactly that: it hooks into the VBA runtime, intercepts specific internal function calls, and makes it return exactly what your test setup …set up.

This API can stop time, or Now can be told to return 1:59AM on first invocation, 1:00AM on the next, and then we can test and assert that some time-sensitive logic survives a daylight savings time toggle, or how Timer-dependent code behaves at midnight.

Let’s take a look at the members of the IFakesProvider interface.

Fakes Provider

Fakes for many of the internal VBA standard library functions exist since the initial release of the feature, although some providers wouldn’t always play nicely together – thanks to a recent pull request from @tommy9 these issues have been resolved, and a merry bunch of additional implementations are now available in pre-release builds:

NameDescriptionParameter names
MsgBoxConfigures VBA.Interaction.MsgBox callsFakes.Params.MsgBox
InputBoxConfigures VBA.Interaction.InputBox callsFakes.Params.InputBox
BeepConfigures VBA.Interaction.Beep calls
EnvironConfigures VBA.Interaction.Environ callsFakes.Params.Environ
TimerConfigures VBA.DateTime.Timer calls
DoEventsConfigures VBA.Interaction.DoEvents calls
ShellConfigures VBA.Interaction.Shell callsFakes.Params.Shell
SendKeysConfigures VBA.Interaction.SendKeys callsFakes.Params.SendKeys
KillConfigures VBA.FileSystem.Kill callsFakes.Params.Kill
MkDirConfigures VBA.FileSystem.MkDir callsFakes.Params.MkDir
RmDirConfigures VBA.FileSystem.RmDir callsFakes.Params.RmDir
ChDirConfigures VBA.FileSystem.ChDir callsFakes.Params.ChDir
ChDriveConfigures VBA.FileSystem.ChDrive callsFakes.Params.ChDrive
CurDirConfigures VBA.FileSystem.CurDir callsFakes.Params.CurDir
NowConfigures VBA.DateTime.Now calls
TimeConfigures VBA.DateTime.Time calls
DateConfigures VBA.DateTime.Date calls
Rnd*Configures VBA.Math.Rnd callsFakes.Params.Rnd
DeleteSetting*Configures VBA.Interaction.DeleteSetting callsFakes.Params.DeleteSetting
SaveSetting*Configures VBA.Interaction.SaveSetting callsFakes.Params.SaveSetting
Randomize*Configures VBA.Math.Randomize callsFakes.Params.Randomize
GetAllSettings*Configures VBA.Interaction.GetAllSettings calls
SetAttr*Configures VBA.FileSystem.SetAttr callsFakes.Params.SetAttr
GetAttr*Configures VBA.FileSystem.GetAttr callsFakes.Params.GetAttr
FileLen*Configures VBA.FileSystem.FileLen callsFakes.Params.FileLen
FileDateTime*Configures VBA.FileSystem.FileDateTime callsFakes.Params.FileDateTime
FreeFile*Configures VBA.FileSystem.FreeFile callsFakes.Params.FreeFile
IMEStatus*Configures VBA.Information.IMEStatus calls
Dir*Configures VBA.FileSystem.Dir callsFakes.Params.Dir
FileCopy*Configures VBA.FileSystem.FileCopy callsFakes.Params.FileCopy
*Members marked with an asterisk are only available in pre-release builds for now.

Parameter Names

The IVerify.ParameterXyz members make a unit test fail if the specified parameter wasn’t given a specified value, but the parameter names must be passed as strings. This is a UX issue: the API essentially requires hard-coded magic string literals in its users’ code; this is obviously error-prone and feels a bit arcane to use. The IFakesProvider interface has been given a Params property that gets an instance of a class that exposes the parameter names for each of the IFake implementations, as shown in the list above, and the screenshot below:

Picking the correct parameter name from a drop-down completion list beats risking a typo, doesn’t it?

Note: the PR for this feature has not yet been merged at the time of this writing.

Testing Without Fakes (aka Testing with Stubs)

Unit tests have a 3-part structure: first we arrange the test, then we act by invoking the method we want to test; lastly, we assert that an actual result matches the expectations. When using fakes, we configure them in the arrange part of the test, and in the assert part we can verify whether (and/or how many times) a particular method was invoked with a particular parameterization.

Let’s say we had a procedure we wanted to write some tests for:

Public Sub TestMe()
    If MsgBox("Print random number?", vbYesNo + vbQuestion, "Test") = vbYes Then
        Debug.Print Now & vbTab & Rnd * 42
    Else
        Debug.Print Now
    End If
End Sub

If we wanted to make this logic fully testable without the Fakes API, we would need to inject (likely as parameters) abstractions for MsgBox, Now, and Debug dependencies: instead of invoking MsgBox directly, the procedure would be invoking the Prompt method of an interface/class that wraps the MsgBox functionality. Unit tests would need a stub implementation of that interface in order to allow some level of configuration setup – an invocation counter, for example. A fully testable version of the above code might then look like this:

Public Sub TestMe(ByVal MessageBox As IMsgBox, ByVal Random As IRnd, ByVal DateTime As IDateTime, ByVal Logger As ILogger)
    If MessageBox.Prompt("Print random number?", "Test") = vbYes Then
        Logger.LogDebug DateTime.Now & vbTab & Random.Next * 42
    Else
        Logger.LogDebug DateTime.Now
    End If
End Sub

The method is testable, because the caller controls all the dependencies. We’re probably injecting an IMsgBox that pops a MsgBox, an IRnd that wraps Rnd, a DateTime parameter that returns VBA.DateTime.Now and an ILogger that writes to the debug pane, but we don’t know any of that. I fact, we could very well run this method with an ILogger that writes to some log file or even to a database; the IRnd implementation could consistently be returning 0.4 on every call, IDateTime.Now could return Now adjusted to UTC, and IMsgBox might actually display a fancy custom modal UserForm dialog – either way, TestMe doesn’t need to change for any of that to happen: it does what it needs to do, in this case fetching the next random number and outputting it along with the current date/time if a user prompt is answered with a “Yes”, otherwise just output the current date/time. It’s the interfaces that provide the abstraction that’s necessary to decouple the dependencies from the logic we want to test. We could implement these interfaces with stubs that simply count the number of times each member is invoked, and the logic we’re testing would still hold.

We could then write tests that validate the conditional logic:

'@TestMethod
Public Sub TestMe_WhenPromptYes_GetsNextRandomValue()
    ' Arrange
    Dim MsgBoxStub As StubMsgBox ' implements IMsgBox, but we want the stub functionality here
    Set MsgBoxStub = New StubMsgBox
    MsgBoxStub.Returns vbYes
    Dim RndStub As StubRnd ' implements IRnd, but we want the stub functionality here too
    Set RndStub = New StubRnd
    ' Act
    Module1.TestMe MsgBoxStub, RndStub, New DateTimeStub, New LoggerStub
    ' Assert
    Assert.Equals 1, RndStub.InvokeCount
End Sub
'@TestMethod
Public Sub TestMe_WhenPromptNo_DoesNotGetNextRandomValue()
    ' Arrange
    Dim MsgBoxStub As StubMsgBox
    Set MsgBoxStub = New StubMsgBox
    MsgBoxStub.Returns vbNo
    Dim RndStub As StubRnd
    Set RndStub = New StubRnd
    ' Act
    Module1.TestMe MsgBoxStub, RndStub, New DateTimeStub, New LoggerStub
    ' Assert
    Assert.Equals 0, RndStub.InvokeCount
End Sub

These stub implementations are class modules that need to be written to support such tests. StubMsgBox would implement IMsgBox and expose a public Returns method to configure its return value; StubRnd would implement IRnd and expose a public InvokeCount property that returns the number of times the IRnd.Next method was called. In other words, it’s quite a bit of boilerplate code that we’d usually rather not need to write.

Let’s see how using the Fakes API changes that.

Using Rubberduck.FakesProvider

The standard test module template defines Assert and Fakes private fields. When early-bound (needs a reference to the Rubberduck type library), the declarations and initialization look like this:

'@TestModule
Option Explicit
Option Private Module
Private Assert As Rubberduck.AssertClass
Private Fakes As Rubberduck.FakesProvider
'@ModuleInitialize
Public Sub ModuleInitialize()
    Set Assert = CreateObject("Rubberduck.AssertClass")
    Set Fakes = CreateObject("Rubberduck.FakesProvider")
End Sub

The Fakes API implements three of the four stubs for us, so we still need an implementation for ILogger, but now the method remains fully testable even with direct MsgBox, Now and Rnd calls:

Public Sub TestMe(ILogger Logger)
    If MsgBox("Print random number?", vbYesNo + vbQuestion, "Test") = vbYes Then
        Logger.LogDebug Now & vbTab & Rnd * 42
    Else
        Logger.LogDebug Now
    End If
End Sub

With an ILogger stub we could write a test that validates what’s being logged in each conditional branch (or we could decide that we don’t need an ILogger interface and we’re fine with tests actually writing to the debug pane, and leave Debug.Print statements in place), but let’s just stick with the same two tests we wrote above without the Fakes API. They look like this now:

'@TestMethod
Public Sub TestMe_WhenPromptYes_GetsNextRandomValue()
    
    ' Arrange
    Fakes.MsgBox.Returns vbYes
    ' Act
    Module1.TestMe New LoggerStub ' ILogger is irrelevant for this test
    ' Assert
    Fakes.Rnd.Verify.Once
End Sub
'@TestMethod
Public Sub TestMe_WhenPromptNo_DoesNotGetNextRandomValue()
    
    ' Arrange
    Fakes.MsgBox.Returns vbNo
    ' Act
    Module1.TestMe New LoggerStub ' ILogger is irrelevant for this test
    ' Assert
    Fakes.Rnd.Verify.Never
End Sub 

We configure the MsgBox fake to return the value we need, we invoke the method under test, and then we verify that the Rnd fake was invoked once or never, depending on what we’re testing. A failed verification will fail the test the same as a failed Assert call.

The fakes automatically track invocations, and remember what parameter values each invocation was made with. Setup can optionally supply an invocation number (1-based) to configure specific invocations, and verification can be made against specific invocation numbers as well, and we could have a failing test that validates whether Randomize is invoked when Rnd is called.

API Details

The IFake interface exposes members for the setup/configuration of fakes:

NameDescription
AssignsByRefConfigures the fake such as an invocation assigns the specified value to the specified ByRef argument.
PassthroughGets/sets whether invocations should pass through to the native call.
RaisesErrorConfigures the fake such as an invocation raises the specified run-time error.
ReturnsConfigures the fake such as the specified invocation returns the specified value.
ReturnsWhenConfigures the fake such as the specified invocation returns the specified value
given a specific parameter value.
VerifyGets an interface for verifying invocations performed during the test. See IVerify.
The members of Rubberduck.IFake

The IVerify interface exposes members for verifying what happened during the “Act” phase of the test:

NameDescription
AtLeastVerifies that the faked procedure was called a specified minimum number of times.
AtLeastOnceVerifies that the faked procedure was called one or more times.
AtMostVerifies that the faked procedure was called a specified maximum number of times.
AtMostOnceVerifies that the faked procedure was not called or was only called once.
BetweenVerifies that the number of times the faked procedure was called falls within the supplied range.
ExactlyVerifies that the faked procedure was called a specified number of times.
NeverVerifies that the faked procedure was called exactly 0 times.
OnceVerifies that the faked procedure was called exactly one time.
ParameterVerifies that the value of a given parameter to the faked procedure matches a specific value.
ParameterInRangeVerifies that the value of a given parameter to the faked procedure falls within a specified range.
ParameterIsPassedVerifies that an optional parameter was passed to the faked procedure. The value is not evaluated.
ParameterIsTypeVerifies that the passed value of a given parameter was of a type that matches the given type name.
The members of Rubberduck.IVerify

There’s also an IStub interface: it’s a subset of IFake, without the Returns setup methods. Thus, IStub is used for faking Sub procedures, and IFake for Function and Property procedures.


When to Stub Standard Library Members

Members of VBA.FileSystem not covered include EOF and LOF functions, Loc, Seek, and Reset. VBA I/O keywords Name, Open, and Close operate at a lower level than the standard library and aren’t covered, either. VBA.Interaction.CreateObject and VBA.Interaction.GetObject, VBA.Interaction.AppActivate, VBA.Interaction.CallByName, and the hidden VBA.Interaction.MacScript function, aren’t implemented.

Perhaps CreateObject and GetObject calls belong behind an abstract factory and a provider interface, respectively, and perhaps CallByName doesn’t really need hooking anyway. In any case there are a number of file I/O operations that cannot be faked and demand an abstraction layer between the I/O work and the code that commands it: that’s when you’re going to want to write stub implementations.

If you’re writing a macro that makes an HTTP request and processes its response, consider abstracting the HttpClient stuff behind an interface (something like Function HttpGet(ByVal Url As String)): the macro code will gain in readability and focus, and then if you inject that interface as a parameter, then a unit test can inject a stub implementation for it, and you can write tests that handle (or not?) an HTTP client error, or process such or such JSON or HTML payload – without hitting any actual network and making any actual HTTP requests.

Until we can do mocking with Rubberduck, writing test stubs for our system-boundary interfaces is going to have to be it. Mocking would remove the need to explicitly implement most test stubs, by enabling the same kind of customization as with fakes, but with your own interfaces/classes. Or Excel’s. Or anything, in theory.


Rubberduck 3.0 Progress Update

The next major version of Rubberduck is currently in [very] early development stages – saying that there is a lot of work ahead would be quite an understatement, but the skeleton is slowly taking shape, and things are looking very, very good.

Since the beginning of the project, Rubberduck’s user interface components (other than dialogs) have always been hosted in traditional, native dockable toolwindows. We built everything on top of the VBIDE editor, using Office CommandBar UI to simulate a status bar and make up for the lack of in-editor integration. Over the years this early design decision slowly became a burden: tearing down the many dockable toolwindows contributed to a pesky access violation crash on exit, low-level hooks for keyboard shortcuts constantly need to detach and re-attach as focus switches between the VBE main window and other applications, autocompletion/self-closing pairs was a nightmare to implement, and while the all-or-nothing approach to parsing made it so that we could always assume we were looking at valid VBA code that could be compiled, it also painted us into a corner where actually moving towards what we wanted Rubberduck to achieve by v3.0 would be extremely difficult, if not impossible.

Behold, the Rubberduck Editor

Rubberduck’s input was always driven by the Visual Basic Editor – now the code in the VBE is going to be output by Rubberduck. Of course, the code will go both ways, but now hidden attributes probably won’t need to be hidden anymore, and the editor can now be exactly what we envision it to be.

There will only be a single toolwindow that will host the editor and UI components like the Code Explorer. At this early stage my focus is entirely on the editor itself, but the idea is ultimately to get actual document tabs and a more practical and friendly docking manager.

Here’s what it looks like as of this writing:

The dropdowns don’t have a real item source yet, but the mock data gives a good idea of what it’s going to be like to edit VBA code with Rubberduck in the future.

Typing “Sub” and hitting the spacebar immediately completes the block and places a new folding node:

The faint dotted underline under “Sub” is a text marker; the editor has the ability to display various such markers at the exact desired position in the document, so we will be using them to show inspection results right there – with tooltips:

Hint-level results will be denoted with this dotted underline indicator; suggestion level will be a green squiggly underline, warnings a blue squiggle, and error level results will appear as red squiggles:

There will also be a new “ducky button” that pops up when the caret is on one such marker, and lets you pick a quick-fix in-place to address an inspection result:


The indenter still needs to be wired up, but this editor will ultimately indent your code as you type it. All the autocompletion features also need to be ported over to work here, and then we’ll want searchable and filterable IntelliSense, parameter info tooltips, and we’ll need to simulate the VBIDE “prettification” that occurs when a line is validated, so that public sub becomes Public Sub and identifiers take the casing they’re declared with.

We get an undo stack that can handle much more than 20 steps, and did I mention the status bar?

For now, all it does is report the current caret position in the editor, but Rubberduck 3.0 will be using it to report parsing progress, instead of the CommandBar button/label we’ve been abusing forever.

There will probably still be a command bar of some sort, but it will be part of the WPF/XAML managed UI; the old Rubberduck CommandBar will be decommissioned.

The one thing that’s 100% guaranteed to not happen in the new Rubberduck editor, is everything that needs to happen beyond design-time: there is no hook into the VBIDE debugger, so Rubberduck has no way of tracking the current instruction. As a result, the editor will be sadly useless in debug mode.


The editor work is just the beginning: Rubberduck 3.0 currently doesn’t even have a parser, let alone any inspections. In the next few months, the very heart of Rubberduck will be reworked to function with the new editor. It’s essentially like rewriting Rubberduck, but with an editor we fully control instead of one we constantly need to fight with.

Meanwhile v2.5.2 is approaching 25K downloads, and there’s quite a bit of work in 2.5.x that hasn’t been “officially” released yet, including everything that happened during a very successful Hacktoberfest 2022: we’ll be releasing v2.5.3 in the near future – stay tuned!

Hello, Rubberduck 2.5.0

Creating the pull request to merge the current [next] branch into [master] is always thrilling: the incredible amount of work that goes into Rubberduck, release after release, never ceases to amaze me. This time (again!), the pull request is well over 1.2K commits. Green-release version 2.4.1.0 was all the way back on March 25, 2019 – which was the Monday that immediately followed the last MVP Global Summit.

What’s new?

If you’ve been keeping up with pre-release builds, you already know. If you’re still using v2.4.1.0 and have the check for newer version at startup setting enabled, your ducky will be telling you about the new build next time you fire up the VBE.

When you update to v2.5, you’ll notice a new option for the check for newer version at startup setting: there’s a new “check for pre-release builds” option that can let you know not only of a new minor version bump, but also for every pre-release build – which effectively means you now get to keep Rubberduck as up-to-date as possible (every merged pull request), without needing to subscribe to GitHub email notifications.

Splash Screen

But the first thing you’ll notice (assuming you haven’t disabled it) will be the splash screen going back to the 2.4.0 yellow ducky splash – if you didn’t know, v2.4.1 was “ThunderFrame Edition” and all this time the splash screen was a nod to our dear friend Andrew Jackson:

Rubberduck’s repository is still filled with hundreds of Andrew’s ideas, and his impact on the project will remain with us forever. This ducky is based on Andrew’s work, too:

I’m not a fan of the font (it’s the same as on the ThunderSplash), but SHOWCARD GOTHIC was getting old and annoyingly too playful-looking. If a graphic artist is reading this and has a nice idea they’d like to contribute, they’re welcome to do so!

But you’re not here to read about the splash screen, are you?

Website/GitHub Integration

In the past, a new green-release meant Rubberduck needed to be deployed to the project’s website itself, so that the /version/build pages could respond with the assembly version of the Rubberduck.dll file deployed. Today the website only needs a Rubberduck build to support the online indenter page, and we only need to update that build to keep the online indenter preview tool up-to-date: if no indenter changes are made, then nothing needs to be updated – the website uses GitHub’s REST API to get the latest pre-release and official “green release” version numbers, but also to download the latest xml-doc from the Rubberduck.CodeAnalysis project, and with that the website’s /inspections/list page will now start identifying the newer inspections that are only available in a pre-release build, versus those present in the latest “green release” (this hasn’t kicked in yet, only because the [master] branch didn’t have any xml-docs to download). The /inspections/details pages are also entirely generated from the in-code xml documentation, including the many examples: we’ll eventually start linking to these pages in the inspection results toolwindow, with “why am I seeing this?” links/buttons.

New Features?

New inspections and new quickfixes, of course – but mostly lots of bugs fixed, and extremely important enhancements to the resolver logic effectively warrant the minor version bump. As mentioned in What’s Cooking for Rubberduck 2.5.x, special attention was given to the resolution of implicit default member calls and bang notation – and with that there’s very, very little early-bound code (if any) that Rubberduck isn’t understanding.

Self-closing pairs aren’t a new feature, but Rubberduck will now ship with the feature enabled by default (was opt-in before). We have been able to hijack and suppress the annoying “beep” that the VBE sounds when the Parameter Quick-Info command doesn’t have anything to show, and this has unlocked restoring automatic quick-info when typing the argument list of a function or procedure call: before that, using self-closing pairs worked pretty nicely, but parameter quick-info had to be manual, which was rather disturbing.

VBA + Source Control

If you’ve been following the project for some time, you probably remember the defunct source control panel – a toolwindow that essentially implemented Visual Studio’s Team Explorer and let you synchronize your VBA project with the files in a git repository. It would also list modified files and let you commit, push, pull, fetch, create new branches, merge them, etc. It failed and isn’t coming back, but the Code Explorer in v2.5 brings back the ability to synchronize the contents of your VBA project from the file system:

Update Components from Files will update existing modules from files in a selected folder, and Replace Contents from Files will make the VBA project mirror the contents of the selected folder (creating new project components/modules as needed). Because Visual Basic 6.0 already works off the file system, in VB6 we only offer the Update Components from Files command.

Keep in mind that while the contents of document modules can be imported, new document modules can’t be added to the host project by the VBE (the host application owns these modules: see this article): for this reason you will want to minimize the amount of code you have in modules like ThisWorkbook and other Worksheet modules in Excel, or in reports & forms in Access. Implementing the actual functionality in separate modules will make things much easier to work with this feature in conjunction with source control (whether you use git, mercurial, SVN, or any other VCS technology).

Visual Studio 2019

Rubberduck has been built with Visual Studio 2017 for quite some time: we have successfully updated all projects in the solution to the awesome new .csproj format, and until now the WPF (Windows Presentation Foundation – the .NET UI framework we use to design our toolwindows and dialogs) dependencies made it impossible to upgrade our build process to work in Visual Studio 2019 until the release of .NET Core 3 last September. This release marks the milestone where we flip the page, sunset Visual Studio 2017 – the first pull request to be merged after v2.5.0, will be one that updates the build process to work with Visual Studio 2019.

If you have forked or cloned Rubberduck, please note that Rubberduck will no longer build in VS2017, as soon as it builds in VS2019.


What Next?

One of the biggest road blocks that’s currently keeping us from implementing a lot of the amazing inspection ideas (and bringing back a proper Extract Method refactoring!), is the lack of proper code path analysis. With that, we’ll have standard tooling that all these inspections can share and reuse (rather than reinvent a rather complex wheel everytime), and then we can tackle the many open Code Path Analysis issues. I’ll be posting an “Inside Rubberduck” article about the architecture and thinking behind this at some point.

Another road block, that’s currently keeping Rubberduck from fully understanding the interfaces it’s looking at, is flicking the switch for our internal TypeLib API, which taps deep into the VBIDE’s guts and gives us visibility on the internal ITypeLib of the VBA project. Rubberduck is already leveraging some of these capabilities (that’s how unit testing works in every VBA host application), but by flicking that switch we’ll be able to, among many other things, pick up the Workbook interface of the ThisWorkbook module… which unlocks fixing a number of long-standing issues and inspection false positives.

Block Completion is another upcoming feature that will possibly be getting my attention in 2020, but not before Code Path Analysis does.

In order to address the growing concerns of performance and memory consumption (especially in larger projects, which currently work best in 64-bit hosts, and possibly not at all in 32-bit hosts), we are exploring implementing a Language Server to offload parsing & resolution out of the host process, similar to how VSCode & Roslyn works, and possibly also moving a lot of the in-memory storage of referenced type libraries’ declarations to an out-of-process database.

Rubberduck v2.4.0

Unlike quite a number of Rubberduck releases, this time we’re not boasting a thousand commits though: we’re looking at well under 300 changes, but if the last you’ve seen of Rubberduck was 2.3.0 or prior, …trying this release you’ll quickly realize why we originally wanted to release it around Christmas.

So, here’s your belated Christmas gift from the Rubberduck dev team!

VBE Project References: CURED!

You may have seen the Introducing the Reference Explorer announcement post last autumn – well, the new feature is now field-tested, works beautifully, instinctively, and is ready for prime time. It’s a beauty!

The add/remove references dialog has seen a number of enhancements since its pre-release: thanks everyone for your constructive feedback!
Quickly locate any type library by name and/or description.
Pin your favorite references, and Rubberduck will keep them handy for all your VBA/VB6 projects.

You’ll never want to use the vanilla-VBE references dialog again!

If you’ve been following the Rubberduck project for quite some time, you may remember something about using annotations together with inspections and quick-fixes to document the presence of module & member attributes. You may also remember when & why the idea was dropped. Keeping in tradition with including new inspections every release… Surprise, it’s coming back!

German, French, and Czech translations have been updated, a number of bugs were fixed in a few inspections, the Code Explorer has seen a number of subtle enhancements, and WPF binding leaks are all but gone.

Code Explorer Enhancements

Adding the Reference Explorer made a perfect opportunity to revisit the Code Explorer toolwindow – our signature navigation feature. Behold, the new Code Explorer:

The new ‘Sync with code pane’ toolbar button (the left/right arrows icon) selects the treeview node closest to the current code pane selection.

There’s a new ‘Library References’ node that shows your project’s library dependencies …whether they’re in use or not:

Find all references can now be used to locate all uses of a given type library – including the built-in standard libraries! Note that rendering lots of search results in a toolwindow will require confirmation if there are too many results to display.

The project reference nodes get new icons:

Classes with a VB_PredeclaredID attribute set to True now have their own icon too (and their names now say (Predeclared) explicitly), and class modules marked with an @Interface annotation now appear with an “interface” icon, like IGameStrategy here:

Annotations & Attributes

They’re back, and this time it does work, and it’s another game changer: Rubberduck users no longer need to export any code file to modify module & member attributes!

Module & Member Annotations

At module level, the @ModuleDescription annotation can be given a string argument that controls the value of the module’s VB_Description attribute; the @Exposed annotation controls the value of the VB_Exposed attribute; the presence of a @PredeclaredId annotation signals a VB_PredeclaredId attribute with a value of True.

At member level, @Description annotations can be given a string argument that controls the value of the member’s VB_Description attribute.

Through inspections, Rubberduck is now able to warn about attributes that don’t have a corresponding annotation, and annotations that don’t have a corresponding attributes. Look for inspection results under the “Rubberduck Opportunities” category.

v2.4.x

The months to come will see further enhancements in several areas; there are several pull requests lined up already – stay tuned, and keep up with the pre-release builds by watching releases on GitHub!

Coming soon, in Rubberduck 2.2

The last “green” release was a couple of months ago already – time to take a step back, look at all we’ve done, and call it a “minor” update.

What’s up duck?

Functionality-wise, not much. Bug fixes, yes; this means fewer inspection false positives, fewer caching accidents, overall more stable usage. But this time some serious progress was also made in the COM & RCW management area, and Rubberduck 2.2 no longer crashes on exit, or leave a dangling host process, or brick the VBE on reload. Some components are still stubbornly refusing to properly release, so unload+reload is still a not-recommended thing to do, but doing so no longer causes access violations. Which is neat, because this particular problem had been plaguing Rubberduck since the early days of 2.0.

Source Control Disintegration

If you haven’t been following the project since v2.1 was released, you may be disappointed to learn that we are officially dropping the source control integration feature. Not saying it’ll never resurface, but the feature was never really stable, and rather than drain our limited resources on a nice but non-essential feature, we focused on the “core” stuff for now. So instead of keeping the half-baked, half-broken thing in place, we removed it – entirely, so there’s 0 chance any part of it interferes with anything else (there were hooks in place, handling parser state changes and some VBE events).

The “Export Project” functionality remains though, so you can still use your favorite source control provider (Git, SVN, Mercurial, etc.) – Rubberduck just isn’t providing a UI to wrap that provider’s functionality anymore.

Shiny & New

We have new inspections! Rubberduck can now tell you when a Case block is semantically unreachable. Or when For loops specify a redundant Step 1, or if you prefer having an explicit Step clause everywhere, it can tell you about that too. Another inspection warns about error-handling suppression (On Error Resume Next) that is never restored (On Error GoTo 0). If you’re unfortunate enough to encounter the thoroughly evil Def[Type] statements, you’ll be relieved to know that Rubberduck will now warn you about implicitly typed identifiers.

Code Metrics is an entirely new tool, that evaluates cyclomatic complexity and nesting levels of each method and module. The feature clearly needs some UI work (wink wink, nudge nudge, C#/WPF reader), and enhancement ideas are always welcome.

The unit test execution engine no longer invokes the host application. There’s a bit of black magic going on here, but to keep it simple, the unit testing feature now works in every single VBE host application.

But the most spectacular changes aren’t really tangible, user-facing things. We’ve streamlined settings, upgrated our grammars from Antlr4.3 to Antlr4.6 – which fixed a number of parser issues, including significant performance improvements when parsing long Boolean expressions; the IInspection interface was fine-tuned again, COM object references were removed in a number of critical places. If you have a fork of the project, you already know that we’ve split Rubberduck.dll into Rubberduck.Core.dll and Rubberduck.Main.dll, with the entry point and IoC configuration in ‘Main’.

Oh, I lied. One of the most spectacular changes is a tangible, user-facing thing. It’s just not exactly in the main code base, is all. Poor installer, always gets left behind.

Administrative Privileges no longer needed!

Since a couple of pre-release builds, the Rubberduck installer supports per-user installs that no longer require admin privs. This means Rubberduck can now be installed on a locked-down workstation, without requiring IT intervention! This revamped installer also detects and properly uninstalls a previous Rubberduck install (admin elevation would be required to uninstall a per-machine installation of a previous build though), so manually uninstalling through the control panel before upgrading, is no longer recommended/needed. Doesn’t hurt, but shouldn’t change anything, really.

The “installating / instructions” and “contributing / initial setup” wiki pages have been updated accordingly on GitHub.

This new installer no longer assumes Microsoft Office is present, and registers for both 32 and 64-bit host applications.


That’s it? What happened to the rest of 2.1.x?

I did say “minor update”, yeah? The previously announced roadmap for 2.1.x was too ambitious, and not much of it is shipping in this release. In fact, that roadmap should have said “2.x”… versioning is hard, okay? If we stuck to 2.1.x, then a v2.2 would have been moot, since by then we would have had much of 3.0 in place.

Anyway, 2.2 is a terrific improvement over 2.1, on many levels – and that can only mean one thing: that the current development cycle will inevitably lead to even more awesomeness!

RD2018

Inside Rubberduck (pt.2)

https://www.cgtrader.com/free-3d-print-models/art/sculptures/rubber-duck-voronoi-style
“Rubber Duck” – Voronoi Style Free 3D print model by Roman Hegglin

Last time I went over the startup and initialization of Rubberduck, and I said I was going to follow it up with how the parser and resolver work.

Just so happens that Max Dörner, who has pretty much owned the parser and resolver parts of Rubberduck since he joined the project (that’s right – jumped head-first into some of the toughest, most complicated code in the project!), has nicely documented the highlights of how parsing and resolving works.

So yeah, all I did here was type up an intro. Buckle up, you’re in for a ride!

Part 2: Parsing & Resolving

Rubberduck processes the code in all unprotected modules in a five-step process. First, in the parser state Pending, the projects and modules to parse are determined. Then, in the parser state LoadingReferences, the references currently used by the projects, e.g. the Excel object model, and some built-in declarations are loaded into Rubberduck. Following this, the actual processing of the code begins. Between the parser states Parsing and Parsed the code gets parsed into parse trees with the help of Antlr4. Following this, between the states ResolvingDeclarations and ResolvedDeclarations the module, method and variable declarations are generated based on the parse tree. Finally, between the states ResolvingReferences and Ready the parse trees are walked a second time to determine the references to the declarations within the code.

At each state change, an event is fired which can be handled by any feature subscribing to it, e.g. the CodeExplorer, which listens for the state change to ResolvedDeclarations.

A More Detailed Story

The entry point for the parsing process is the ParseCoordinator inside the Rubberduck.Parsingassembly. It coordinates the parsing process and is responsible for triggering the appropriate state changes at the right time, for which it uses a IParserStateManager passed to it. To trigger the different stages of the parsing process, the ParseCoordinator uses a IParsingStageService. This is a facade passed to it providing a unified interface for calling the individual stages, which are all implemented in an individual set of classes. Each has a concurrent version for production and a synchronous one for testing. The latter was needed because of concurrency problems of the mocking framework.

General Logistics

Every parsing run gets executed in fresh background task. Moreover, to always be in a consistent state, we allow only one parsing run to execute at a time. This is achieved by acquiring a lock in a top level method. This top level method is also the point at which any cancellation or unexpected exception will be caught and logged.

The first step of the actual parsing process is to set the overall parser state to Pending. This signals to all components of Rubberduck that we left a fully usable state. Afterwards, we refresh the projects cache on the RubberduckParserState asking the VBE for the loaded projects and then acquire a collection of the modules currenlty present.

Loading References

After setting the overall parser state to LoadingReferences, the declarations for the project references, i.e. the references selected in Tools –> References... , get loaded into Rubberduck. This is done using the ReferencedDeclarationsCollector in the Rubberduck.Parsing.ComReflectionnamespace, which reads the appropriate type libraries and generates the corresponding declarations.

Note that the order in the References dialog determines what procedure or field an identifier resolves to in VBA if two or more references define a procedure or field of the same name. This prioritization is taken into account when loading the references.

Unfortunately, we are currently not able to load all built-in declarations from the type libraries: there are some hidden members of the MSForms library, some special syntax declarations like LBound and everything related to Debug, and aliases for built-in functions like Left, where Leftis the alias for the actual hidden function defined in the VBA type library. These get loaded as a set of hand-crafted declarations defined in the Rubberduck.Parsing.Symbols.DeclarationLoadersnamespace.

Parsing the Code

At the start of the processing of the actual code, the parser state is set to Parsing. However, this time this is achieved by setting the individual modules states of the modules to be parsed and then evaluating the overall state.

Each module gets parsed separately using an individual ComponentParseTask from the Rubberduck.Parsing.VBA namespace, which is powered by the Antlr4 parser generator. The end result is a pair of two parse trees providing a structured representation of the code one time as seen in the VBE and one time as exported to file.

The general process using Antlr is to provide the code to a lexer that turns the code into a stream of tokens based on lexer rules. (The lexer rules used in Rubberduck can be found in the file VBALexer.g4 in the Rubberduck.Parsing.Grammar namespace.) Then this token stream gets processed by a parser that generates a parse tree based on the stream and a set of parser rules describing the syntactic rules of the language. (The VBA parser rules used in Rubberduck can be found in the file VBAParser.g4 in the Rubberduck.Parsing.Grammar namespace. However, there are more specialized rules in the project). The parse tree then consists of nodes of various types corresponding to the rules in the parser rules.

Even when counting the Antlr workflow described above as one step, the actual parsing process in the ComponentParseTask is a multi stage process in itself. This has two reasons: there are precompiler directives in VBA and some information regarding modules is hidden from the user inside the VBE, namely attributes.

The precompiler directives in VBA allow to conditionally select which code is alive. This allows to write code that would only be legal VBA after evaluating the conditional compilation directives. Accordingly, this has to be done before the code reaches the parser. To achieve this, we parse each module first with a specialized grammar for the precompiler directives and then hide all tokens that are dead after the evaluation from the VBA parser, including the precompiler directives themselves, by sending the tokens to a hidden channel in the tokenstream. Afterwards, the dead code is still part of the text representation of the tokenstream by disregarded by the parser.

To cover both the attributes, which are only present in the exported modules, and provide meaningful line numbers in inspection results, errors and the command bar, we parse both the attributes and the code as seen in the VBE code pane into a separate parse tree and save both on the ModuleState belonging to the module on the RubberduckParserState.

One thing of note is that Antlr provides two different kinds of parsers: the LL parser that basically parses all valid input for every not indirectly left-recursive grammar (our VBA grammar satisfies this) and the SLL parser, which is considerably faster but cannot necessarily parse all valid input for all such grammars. Both parsers are guaranteed to yield the same result whenever the parse succeeds at all. Since the SLL parser works for next to all commonly encountered code, we first parse using it and fall back to the LL parser if there is a parser error.

Following the parse, the state of the module is set to Parsed on a successful parse and to ParserError, otherwise. After all modules have finished parsing, the overall parser state is evaluated. If there has been any parser error, the parsing process ends here.

Resolving Declarations

After parsing the code into parse trees, it is time to generate the declarations for the procedures, functions, properties, variables and arguments in the code.

First, the state of all modules gets set to ResolvingDeclarations, analogous to the start of parsing the code. Then the tree walker and listener infrastructure of Antlr is used to traverse the parse trees and generate declarations whenever the appropriate grammar constructs are encountered. This is done inside the implementations of IDeclarationResolveRunner in the Rubberduck.Parsing.VBAnamespace.

Note that there is still some information missing on the declarations at this point that cannot be determined in this first pass over the parse trees. E.g. the supertypes of classes implementing the interface of another class are not known yet and, although the name of the type of each declaration is already known, the actual type might not be known yet. For both cases we first have to know all declarations.

After the parse trees of all modules have been walked, the overall parser state gets set to ResolvedDeclarations, unless there has been an error, which would result in the state ResolverError and an immediate stop of the parsing run.

Resolving References

After all declarations are known, it is possible to resolve all references to these declarations within the code, beit as types, supertypes or in expressions. This is done using the implementations of IReferenceResolveRunner in the Rubberduck.Parsing.VBA namespace.

First, the state of the modules for which to resolve the references gets set to ResolvingReferencesand the overall state gets evaluated. Then the CompilationPasses run. In these the type names found when resolving the declarations get resolved to the actual types. Moreover, the type hierarchy gets determined, i.e. super- and and subtypes get added to the declarations based on the implements statements in the code.

After that, the parse trees get walked again to find all references to the declarations. This is a slightly complicated process because of the various language constructs in VBA. As a side effect, the variables not resolving to any declaration get collected. Based on these, new declarations get created, which get marked as undeclared. These form the basis for the inspection for undeclared variables.

After all references in a module got resolved, the module state gets set to Ready. If there is some error, the module state gets set to ResolverError. Finally, the overall state gets evaluated and the parsing run ends.

Handling State Changes

On each change of the overall state, an event is raised to which other features can subscribe. Examples are the CodeExplorer, which refreshes on the change to ResolvedDeclarations, and the inspections, which run on the change to Ready.

Handling any state change but the two above is discouraged, except maybe for the change to Pending or the error states if done to disable things. The problem with the other states is that they may never be encountered during a parsing run due to optimizations. Moreover, Rubberduck is generally not in a stable state between Pending and ResolvedDeclarations. Features requiring access to references should generally only handle the Ready state.

Events also get raised for changes of individual module states. However, it should be preferred to handle overall state changes because module states change a lot, especially in large projects.

IMPORTANT: Never request a parse from a state change handler! That will cancel the current parse right after the handlers for this state in favor of the newly requested one.

Doing Only What Is Necessary

When parsing again after a successful parsing run, the easiest way to proceed is to throw away all information you got from the last parsing run and start from scratch. However, this is quite wasteful since typically only a few modules change between parsing runs. So, we try to reuse as much information as possible from prior parsing runs. Since our VBA grammar is build for parsing entire modules the smallest unit of reuse of information we can work with is a module.

We only reparse modules that satisfy one of three conditions: they are new, modified, or not in the state Ready. For the first two conditions it should be obvious why we have to reparse such modules. The question is rather how we evaluate these conditions.

To be able to determine whether a module has changed, we save a hash of the code contained in the module whenever the module gets parsed successfully. At the start of the parsing run, we compare the saved hash with the hash of the corresponding freshly loaded component to find those modules with modified content. In addition we save a flag on the module telling us whether the content hash has ever been saved. If this is not the case, the module is regarded as new.

For the third condition the question is rather why we also reparse such modules. The reason is that such modules might be in an invalid state although the content hash had been written in the last parsing run. E.g. they might have encountered a resolver error or they got parsed successfully in the last parsing run, but the parsing run got cancelled before the declarations got resolved. In these cases the content hash has already been saved so that the module is neither considered to be new nor modified. Consequently, it would not be considered for parsing and resolving if only modules satisfying one of the first two conditions were considered. Because of the possibility of such problems, we rather err on the save side and reparse every module that has not reached the success state Ready.

Since reparsing makes all information we previously acquired about the module invalid, we have to resolve the declarations anew for the modules we reparse. Fortunately, the base characteristics of a declaration only depend on the module it is defined in. So, we only have to resolve declarations for those modules that get reparsed. For references the situation is more complicated.

Since all declarations from the modules we reparse get replaced with new ones, all references to them, all super- and subtypes involving the reparsed modules and all variable and method types involving the reparsed modules are invalid. So, we have to re-resolve the references for all modules that reference the reparsed modules. To allow us to know which modules these are we save the information which module references which other modules in an implementation of IModuleToModuleReferenceManager accessed in the ParseCoordinator via the IParsingCacheService facade. This information gets saved whenever the references for all modules have been resolved successfully, even before evaluating the overall parser state.

In addition to the modules that reference modules that got reparsed, we also re-resolve those modules that referenced modules or project references having just been removed. This is necessary because the references might now point to different declarations. In particular, a renamed module is treated as unrelated to the old one. This means that renaming a module looks to Rubberduck like the removal of the old module and the addition of a new module with a new name.

The final optimization in place on a reparse is that we do not reload the referenced type libraries or the special built-in declarations every time. We just reload those we have not loaded before.

Caching and Cache Invalidation

If you have read the previous paragraph, you might have already realized that the additional speed due to only doing what is necessary comes at a cost: various types of cached data get invalid after parsing and resolving only some modules. So we have to remove the data at a suitable place in the parsing process. To achieve this the ParseCoordinator primarily calls different methods from the IParsingCacheService facade handed to it.

In the next sections we will work our way up from cache data for which you would probably seldom realize that we forgot to remove it to data for which forgetting to remove it sends the parser down in flames. After that, we will finish with a few words about refreshing the DeclarationFinder on the RubberduckParserState.

Invalid Type Declarations

The kind of cache invalidation problem you would probably not realize is that the type as which a variable is defined has to be replaced in case it is a user defined class and the class module gets reparsed; it now has a different declaration. This would probably just cause some issues with some inspections because the actual IdentifierReference tying the identifier to the class declaration is not related to the type declaration we save. Fortunately, the TypeAnnotationPass works by replacing the type declaration anyway. So, we just have to do that for all modules for which we resolve references.

Invalid Super- and Subtypes

As mentioned in the section about resolving references, we run a TypeHierarchyPass to determine the super- and subtypes of each class module (and built-in library). After reparsing a module, we have to re-resolve its supertypes. However, we also have to remove the old declaration of the module itself from the supertypes of its subtypes and from the subtypes of its supertypes, which has some further data invalidation consequences. Otherwise, the “Find all Implementations” dialog or the rename refactoring might produce …interesting results for the affected modules.

The removal of the super- and subtypes is performed via an implementation of ISupertypeCleareron all modules we re-resolve, including the modules we reparse, before clearing the state of the modules to be reparsed. Here, a removal of the supertypes is sufficient because everything is wired up such that manipulating the supertypes automatically triggers the corresponding change on the subtypes.

Invalid Module-To-Module References

As with all other reference caches, part of our cache saving which module references which other modules can become invalid when we re-resolve a module; it might just be that the the part referencing another module is gone. Fortunately, the way these references are saved does not depend on the actual declarations. So reparsing alone does not cause problems. This allows us to defer the removal of the module-to-module references to the reference resolver.

Being able to postpone the removal until we resolve references is fortunate because of potential problems with cancellations. We use the module-to-module references to determine which modules need to be re-resolved. If they got removed and the parsing run got cancelled before they got filled again in the reference resolver, we would potentially miss modules we have to re-resolve. Then the user would need to modify the affected modules in order to force Rubberduck to re-resolve them.

To handle this problem, the reference resolver itself has a cache of the modules to resolve, which is only cleared at the very end of its work. This is safe because the reference resolver only ever processes modules for which it can find a parse tree on the RupperduckParserState.

Invalid References

Invalid IndentifierReferences to declarations from previous parsing runs can cause any number of strange behaviors. This can range from selections referring to references that have once been at that line and column but having been removed in the meantime to refactorings changing things they really should not change.

It is rather clear that the references from all modules to be re-resolved should be removed. However, this is not as straightforward as it seems. The problem is that the references live in a collection on the referenced declaration and not in a collection attached to the module whose code is referencing the declaration. In particular, this makes it easy to forget to remove references from built-in declarations. To avoid such issues, we extracted the logic for removing references by a module into implementations of IReferenceRemover, which is hidden behind the IParsingCacheService facade.

Modules And Projects That No Longer Exist

Now we come to the piece where everything falls to pieces if we are not doing our job, modules and projects that get removed from the VBE. The problem is that some functionality like the CodeExplorer has to query information from the components in the VBE via COM Interop. If a component does no longer exist when the information gets queried, the parsing run will die with a COMException and there is little we can do about that. So we have to be careful to remove all declarations for no longer existing components right at the start of the parsing run.

To find out which modules no longer exist, we simply collect all the modules on the declarations we have cached and compare these to the modules we get from the VBE. More precisely, we compare the identifiers we use for modules, the QualifiedModuleNames. This will also find modules that got renamed. Projects are bit more tricky since they are usually treated as equal if their ProjectIds are the same; we save these in the project help file. Thus, we have to take special care for renamed projects. Knowing the removed projects, their modules get added to the removed modules as well.

Removing the data for removed modules and projects is a bit more complicated than for modules that still exist. After their declarations got removed, there is no sign anymore that they ever existed. So, we have to take special care to remove everything in the right order to guarantee that all information is gone already when we erase the declaration; after each step, the parsing run might be cancelled.

The final effect of removing modules is that the modules referencing the removed modules need to be re-resolved. Intuitively one might think that this will always result in a resolver error. However, keep in mind that renaming is handled as removing a module and adding another. Then the references will simply point to the new renamed module. Because of possible cancellations on the way to resolving the references, we immediately set the state of the modules to be re-resolved to ResolvingReferences. This has the effect that they will be reparsed in case of a cancellation.

Note that basically the same procedure is also necessary whenever we reload project references. Accordingly, we do this right after unloading the references, without allowing cancellations in between.

Refreshing the DeclarationFinder

Since declarations and references change in nearly all steps of the parsing process, we have to refresh our primary cached source of declarations, the DeclarationFinder, quite regularly when parsing. Unfortunately, this is a rather computation intensive thing to do; a lot of dictionaries get populated. So, we refresh only if we need to. E.g. we do not refresh after loading and unloading project references in case nothing changed. However, there are two points in each parsing run where we always have to refresh it: before setting the state to ResolvedDeclarations and before evaluating the overall state at the end of the parsing run, which results in the Ready state in the success path.

Refreshing before the change to ResolvedDeclarations is necessary to ensure that removed modules vanish from the DeclarationFinder before the handlers of this state change event run, including the CodeExplorer. We have to refresh again at the end because, from inside the ParseCoordinator, we can never be sure that the reference resolver did not do anything; it has its own cache of modules that need to be resolved.

One optimization done in the DeclarationFinder itself is that some collections are populated lazily, in particular those dealing only with built-in declarations. This saves the time to rebuild the collections multiple times on each parsing run. However, there is a price to pay. The primary users of the DeclarationFinder are the reference resolver and the inspections, both of which are parallelized. Accordingly, it can happen that multiple threads race to populate the collections. This is bad for the performance of the corresponding features. So, we make compromises by immediately populating the most commonly used collections.

Inside Rubberduck (pt.1)

https://www.cgtrader.com/free-3d-print-models/art/sculptures/rubber-duck-voronoi-style
“Rubber Duck” – Voronoi Style Free 3D print model by Roman Hegglin

Maybe you’ve browsed Rubberduck’s repository, or forked it to get a closer look at the source code. Or maybe you didn’t but you’re still curious about how it might all work.

I haven’t written a blog post in quite a long while (been busy!), so I thought I’d start a series that describes Rubberduck’s internals, starting at the beginning.

Part I: Starting up

Rubberduck embraces the Dependency Injection principle: depend on abstractions, not concrete implementations. Hand-in-hand with DI, the Inversion of Control principle describes how all the decoupled pieces come together. This decoupling enables testable code, which is fundamental when your add-in has a unit testing framework feature in any project of that size.

Because RD is a rather large project, instead of injecting the dependencies (and their dependencies, and these dependencies’ dependencies, and so on…) “by hand”, we use Ninject to do it for us.

We configure Ninject in the Rubberduck.Root namespace, more specifically in the complete mess of a class, RubberduckModule. I say complete mess because, well, a couple of things are wrong in that file. How it steals someone else’s job by constructing the menus, for example. Or how it’s completely under-using the conventions Ninject extension. The abstract factory convention is nice though: Ninject will automatically inject a generated proxy type that implements the factory interface – you never need a concrete implementation of a factory class!

The add-in’s entry point is located in Rubberdcuk._Extension, the class that the VBE discovers in the Windows Registry as an add-in to load. This class implements the IDTExtensibility2 interface, which looks essentially like this:

public interface IDTExtensibility2
{
    void OnAddInsUpdate(ref Array custom);
    void OnConnection(object Application, ext_ConnectMode ConnectMode, object AddInInst, ref Array custom);
    void OnStartupComplete(ref Array custom);
    void OnBeginShutdown(ref Array custom);
    void OnDisconnection(ext_DisconnectMode RemoveMode, ref Array custom);
}

The Application object is the VBE itself – the very same VBE object you’d get in VBA from the host application’s Application.VBE property, and there are a number of things to consider in how these methods are implemented, but everything essentially starts in OnConnection and ends in OnDisconnection.

So we first get hold a reference to the precious Application and AddInInst objects that we receive here, but because we don’t want a direct dependency on the VBIDE API throughout Rubberduck, we wrap it with a wrapper type that implements our IVBE interface – same for the IAddIn(yes, we wrapped every single type in the VBIDE API type library; that way we can at least try to make Rubberduck work in VB6):

 var vbe = (VBE) Application; 
 _ide = new VBEditor.SafeComWrappers.VBA.VBE(vbe);
 VBENativeServices.HookEvents(_ide);
 
 var addin = (AddIn)AddInInst;
 _addin = new VBEditor.SafeComWrappers.VBA.AddIn(addin) { Object = this };

Then InitializeAddIn is called. That method looks for the configuration settings file, and sets the Thread.CurrentUICulture accordingly. When we know that the settings aren’t disabling the startup splash, we get our build number from the running assembly and bring up the splash screen. Only then do we call the Startup method; when Startup returns (or throws), the splash screen is disposed.

The method is pretty simple:

private void Startup()
{
    var currentDomain = AppDomain.CurrentDomain;
    currentDomain.AssemblyResolve += LoadFromSameFolder;

    _kernel = new StandardKernel(
        new NinjectSettings {LoadExtensions = true}, 
        new FuncModule(), 
        new DynamicProxyModule());
    _kernel.Load(new RubberduckModule(_ide, _addin));

    _app = _kernel.Get<App>();
    _app.Startup();

    _isInitialized = true;
}

We initialize a Ninject StandardKernel, load our module (give it our IVBE and IAddIn object references), get an App object and call its Startup method, where the fun stuff begins:

public void Startup()
{
    EnsureLogFolderPathExists();
    EnsureTempPathExists();
    LogRubberduckSart();
    LoadConfig();
    CheckForLegacyIndenterSettings();
    _appMenus.Initialize();
    _hooks.HookHotkeys(); // need to hook hotkeys before we localize menus, to correctly display ShortcutTexts
    _appMenus.Localize();

    UpdateLoggingLevel();

    if (_config.UserSettings.GeneralSettings.CheckVersion)
    {
        _checkVersionCommand.Execute(null);
    }
}

The method names speak for themselves: we conditionally hit the registry looking for a legacy Smart Indenter key to import indenter settings from, and run the asynchronous “version check” command, which sends an HTTP request to http://rubberduckvba.com/build/version/stable, a URL that merely returns the version number of the build that’s running on the website: by comparing that version with the running version, Rubberduck can let you know when a new version is available.

That’s literally all there is to it: just with that, we have a backbone to build with. If we want a new command, we just implement an ICommand, and if that command goes into a menu we hook it up to a CommandMenuItem class. Commands often delegate their work to more specialized objects, e.g. a refactoring, or a presenter of some sort.

Next post will dive into how Rubberduck’s parser and resolver work.

to be continued…

2.0.14?

Recently I asked on Twitter what the next RD News post should be about.

next-rdnews-post-survey-results

Seems you want to hear about upcoming new features, so… here it goes!


The current build contains a number of breakthrough features; I mentioned an actual Fakes framework for Rubberduck unit tests in an earlier post. That will be an ongoing project on its own though; as of this writing the following are implemented:

  • Fakes
    • CurDir
    • DoEvents
    • Environ
    • InputBox
    • MsgBox
    • Shell
    • Timer
  • Stubs
    • Beep
    • ChDir
    • ChDrive
    • Kill
    • MkDir
    • RmDir
    • SendKey

As you can see there’s still a lot to add to this list, but we’re not going to wait until it’s complete to release it. So far everything we’re hijacking hooking up is located in VBA7.DLL, but ideally we’ll eventually have fakes/stubs for the scripting runtime (FileSystemObject), ADODB (database access), and perhaps even host applications’ own libraries (stabbing stubbing the Excel object has been a dream of mine) – they’ll probably become available as separate plug-in downloads, as Rubberduck is heading towards a plug-in architecture.

The essential difference between a Fake and a Stub is that a Fake‘s return value can be configured, whereas a Stub doesn’t return a value. As far as the calling VBA code is concerned, that’s nothing to care about though: it’s just another member call:

[ComVisible(true)]
[Guid(RubberduckGuid.IStubGuid)]
[EditorBrowsable(EditorBrowsableState.Always)]
public interface IStub
{
    [DispId(1)]
    [Description("Gets an interface for verifying invocations performed during the test.")]
    IVerify Verify { get; }

    [DispId(2)]
    [Description("Configures the stub such as an invocation assigns the specified value to the specified ByRef argument.")]
    void AssignsByRef(string Parameter, object Value);

    [DispId(3)]
    [Description("Configures the stub such as an invocation raises the specified run-time eror.")]
    void RaisesError(int Number = 0, string Description = "");

    [DispId(4)]
    [Description("Gets/sets a value that determines whether execution is handled by Rubberduck.")]
    bool PassThrough { get; set; }
}

So how does this sorcery work? Presently, quite rigidly:

[ComVisible(true)]
[Guid(RubberduckGuid.IFakesProviderGuid)]
[EditorBrowsable(EditorBrowsableState.Always)]
public interface IFakesProvider
{
    [DispId(1)]
    [Description("Configures VBA.Interactions.MsgBox calls.")]
    IFake MsgBox { get; }

    [DispId(2)]
    [Description("Configures VBA.Interactions.InputBox calls.")]
    IFake InputBox { get; }

    [DispId(3)]
    [Description("Configures VBA.Interaction.Beep calls.")]
    IStub Beep { get; }

    [DispId(4)]
    [Description("Configures VBA.Interaction.Environ calls.")]
    IFake Environ { get; }

    [DispId(5)]
    [Description("Configures VBA.DateTime.Timer calls.")]
    IFake Timer { get; }

    [DispId(6)]
    [Description("Configures VBA.Interaction.DoEvents calls.")]
    IFake DoEvents { get; }

    [DispId(7)]
    [Description("Configures VBA.Interaction.Shell calls.")]
    IFake Shell { get; }

    [DispId(8)]
    [Description("Configures VBA.Interaction.SendKeys calls.")]
    IStub SendKeys { get; }

    [DispId(9)]
    [Description("Configures VBA.FileSystem.Kill calls.")]
    IStub Kill { get; }

...

Not an ideal solution – the IFakesProvider API needs to change every time a new IFake or IStub implementation needs to be exposed. We’ll think of a better way (ideas welcome)…

So we use the awesomeness of EasyHook to inject a callback that executes whenever the stubbed method gets invoked in the hooked library. Implementing a stub/fake is pretty straightforward… as long as we know which internal function we’re dealing with – for example this is the Beep implementation:

internal class Beep : StubBase
{
    private static readonly IntPtr ProcessAddress = EasyHook.LocalHook.GetProcAddress(TargetLibrary, "rtcBeep");

    public Beep() 
    {
        InjectDelegate(new BeepDelegate(BeepCallback), ProcessAddress);
    }

    [UnmanagedFunctionPointer(CallingConvention.StdCall, SetLastError = true)]
    private delegate void BeepDelegate();

    [DllImport(TargetLibrary, SetLastError = true)]
    private static extern void rtcBeep();

    public void BeepCallback()
    {
        OnCallBack(true);

        if (PassThrough)
        {
            rtcBeep();
        }
    }
}

As you can see the VBA7.DLL (the TargetLibrary) contains a method named rtcBeep which gets invoked whenever the VBA runtime interprets/executes a Beep keyword. The base class StubBase is responsible for telling the Verifier that an usage is being tracked, for tracking the number of invocations, …and disposing all attached hooks.

The FakesProvider disposes all fakes/stubs when a test stops executing, and knows whether a Rubberduck unit test is running: that way, Rubberduck fakes will only ever work during a unit test.

The test module template has been modified accordingly: once this feature is released, every new Rubberduck test module will include the good old Assert As Rubberduck.AssertClass field, but also a new Fakes As Rubberduck.FakesProvider module-level variable that all tests can use to configure their fakes/stubs, so you can write a test for a method that Kills all files in a folder, and verify and validate that the method does indeed invoke VBA.FileSystem.Kill with specific arguments, without worrying about actually deleting anything on disk. Or a test for a method that invokes VBA.Interaction.SendKeys, without actually sending any keys anywhere.

And just so, a new era begins.


Awesome! What else?

One of the oldest dreams in the realm of Rubberduck features, is to be able to add/remove module and member attributes without having to manually export and then re-import the module every time. None of this is merged yet (still very much WIP), but here’s the idea: a bunch of new @Annotations, and a few new inspections:

  • MissingAttributeInspection will compare module/member attributes to module/member annotations, and when an attribute doesn’t have a matching annotation, it will spawn an inspection result. For example if a class has a @PredeclaredId annotation, but no corresponding VB_PredeclaredId attribute, then an inspection result will tell you about it.
  • MissingAnnotationInspection will do the same thing, the other way around: if a member has a VB_Description attribute, but no corresponding @Description annotation, then an inspection result will also tell you about it.
  • IllegalAnnotationInspection will pop a result when an annotation is illegal – e.g. a member annotation at module level, or a duplicate member or module annotation.

These inspections’ quick-fixes will respectively add a missing attribute or annotation, or remove the annotation or attribute, accordingly. The new attributes are:

  • @Description: takes a string parameter that determines a member’s DocString, which appears in the Object Browser‘s bottom panel (and in Rubberduck 3.0’s eventual enhanced IntelliSense… but that one’s quite far down the road). “Add missing attribute” quick-fix will be adding a [MemberName].VB_Description attribute with the specified value.
  • @DefaultMember: a simple parameterless annotation that makes a member be the class’ default member; the quick-fix will be adding a [MemberName].VB_UserMemId attribute with a value of 0. Only one member in a given class can legally have this attribute/annotation.
  • @Enumerator: a simple parameterless annotation that commands a [MemberName].VB_UserMemId attribute with a value of -4, which is required when you’re writing a custom collection class that you want to be able to iterate with a For Each loop construct.
  • @PredeclaredId: a simple parameterless annotation that translates into a VB_PredeclaredId (class) module attribute with a value of True, which is how UserForm objects can be used without Newing them up: the VBA runtime creates a default instance, in global namespace, named after the class itself.
  • @Internal: another parameterless annotation, that controls the VB_Exposed module attribute, which determines if a class is exposed to other, referencing VBA projects. The attribute value will be False when this annotation is specified (it’s True by default).

Because the only way we’ve got to do this (for now) is to export the module, modify the attributes, save the file to disk, and then re-import the module, the quick-fixes will work against all results in that module, and synchronize attributes & annotations in one pass.

Because document modules can’t be imported into the project through the VBE, these attributes will unfortunately not work in document modules. Sad, but on the flip side, this might make [yet] an[other] incentive to implement functionality in dedicated modules, rather than in worksheet/workbook event handler procedures.

Rubberduck command bar addition

The Rubberduck command bar has been used as some kind of status bar from the start, but with context sensitivity, we’re using these VB_Description attributes we’re picking up, and @Description attributes, and DocString metadata in the VBA project’s referenced COM libraries, to display it right there in the toolbar:

docstrings-in-rdbar.PNG

Until we get custom IntelliSense, that’s as good as it’s going to get I guess.


TokenStreamRewriter

As of next release, every single modification to the code is done using Antlr4‘s TokenStreamRewriter – which means we’re no longer rewriting strings and using the VBIDE API to rewrite VBA code (which means a TON of code has just gone “poof!”): we now work with the very tokens that the Antlr-generated parser itself works with. This also means we can now make all the changes we want in a given module, and apply the changes all at once – by rewriting the entire module in one go. This means the VBE’s own native undo feature no longer gets overwhelmed with a rename refactoring, and it means fewer parses, too.

There’s a bit of a problem though. There are things our grammar doesn’t handle:

  • Line numbers
  • Dead code in #If / #Else branches

Rubberduck is kinda cheating, by pre-processing the code such that the parser only sees WS (whitespace) tokens in their place. This worked well… as long as we were using the VBIDE API to rewrite the code. So there’s this part still left to work out: we need the parser’s token stream to determine the “new contents” of a module, but the tokens in there aren’t necessarily the code you had in the VBE before the parse was initiated… and that’s quite a critical issue that needs to be addressed before we can think of releasing.


So we’re not releasing just yet. But when we do, it’s likely not going to be v2.0.14, for everything described above: we’re looking at v2.1 stuff here, and that makes me itch to complete the add/remove project references dialog… and then there’s data-driven testing that’s scheduled for 2.1.x…

To be continued…

Go ahead, mock VBA

Rubberduck has been offering IDE-integrated unit test since day one.

But let’s face it: unit testing is hard. And unit testing VBA code that pops a MsgBox isn’t only hard, it’s outright impossible! Why? Because it defeats the purpose of an automated test: you don’t want to be okaying message boxes (or worse, clicking No when the test needed you to click Yes), you want to run the tests and watch them all turn green!

So you had to implement some kind of wrapper interface, and write code that doesn’t call MsgBox directly – like the D of SOLID says, depend on abstractions, not on concrete types.

So you’d code against some IMsgBox wrapper interface:

Option Explicit
Public Function Show(ByVal prompt As String, _
 Optional ByVal buttons As VbMsgBoxStyle = vbOKOnly, _
 Optional ByVal title As String = vbNullString, _
 Optional ByVal helpFile As String, _
 Optional ByVal context As Long) As VbMsgBoxResult
End Function

And then you’d implement the concrete type:

Option Explicit
Implements IMsgBox
Private Function IMsgBox_Show(ByVal prompt As String, _
 Optional ByVal buttons As VbMsgBoxStyle = vbOKOnly, _
 Optional ByVal title As String = vbNullString, _
 Optional ByVal helpFile As String, _
 Optional ByVal context As Long) As VbMsgBoxResult
    IMsgBox_Show = MsgBox(prompt, buttons, title, helpFile, context)
End Function

Now that gets you compilable VBA code, but if you want to write a test for code where the result of a MsgBox call can influence the tested method’s code path, you need to make a fake implementation, and inject that FakeMsgBox into your code, so that your code calls not the real MsgBox function, but the fake implementation.

And if you want to verify that the code setup a vbYesNo message box with the company name as a title, you need to adapt your fake message box and make it configurable.

In other words, setting up fakes by hand is a pain in the neck.

So this is where Rubberduck tests are going:

'@TestMethod
Public Sub TestMethod1()
    On Error GoTo TestFail
    
    Fakes.MsgBox.Returns 42
    Debug.Print MsgBox("Flabbergasted yet?", vbYesNo, "Rubberduck") 'prints 42
    
    With Fakes.MsgBox.Verify
        .Parameter "prompt", "Flabbergasted yet?"
        .Parameter "buttons", vbYesNo
        .Parameter "title", "Rubberduck"
    End With
TestExit: 
    Exit Sub
TestFail: 
    Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
End Sub

Soon. Very soon. Like, next release soon, Rubberduck will begin to allow unit test code to turn the actual MsgBox into a fake one, by setting up a Rubberduck fake.

So yeah, we’re mocking VBA. All of it.

To Be Continued…