About attributes

A few months ago I wrote about how I envisioned dealing with module and member attributes in the 2.1 cycle:

Annotation/Attributes: fixing these inspection, and the quick-fix that synchronizes annotations with module attributes and vice-versa, will finally expose VB module and member attributes to VBA code panes, using Rubberduck’s annotation syntax.

The idea was to use “magic comments” to annotate a module or a procedure with, for example, this:

'@Description("Does something.")
Public Sub DoSomething()
End Sub

And then since we know there’s a DescriptionAnnotation here for DoSomething, and we also know that there’s no VB_Description attribute for that member because of how we parse VBA code… then we’d issue an inspection result saying “annotations and attributes aren’t synchronized”, and you’d run a quickfix and there you have it, a comment that inserts invisible code.

And, it “works”. But there’s a showstopper here: the VBE itself.

You see the code that’s in the editor is not all the code that’s actually in that file: the VBE is hiding the attributes. So the line positions we get from the VBIDE API aren’t the line positions we’d get from reading the complete code… and this complicates things… rather massively.

We have one parse pass on the exported code to pick up these attributes. Then one parser pass on the in-editor code to build our knowledge of the code as the user sees it.

So we have two ModuleRewriter, each working independently on their own version of the same code. And when the code pane rewriter… rewrites, we run into an annoying VBE quirk I [annoyingly] had already experienced before:

I think that’s a VBE quirk – editing the signature of the method, whether manually or with Rubberduck, tends to obliterate the VB_Attributes (and the shortcut key is a VB_Attribute). — ThunderFrame

We need a “code pane pass” to avoid mangling the code with offset line positions, because the VBE is hiding the attributes.

What’s so important about these attributes anyway? Why bother?

Our resolver needs the critical metadata that’s hidden there. Without them, knowing the actual scope of a declaration is impossible. Nowhere else can we pick up a hint about a VB_PredeclaredId attribute, or VB_Exposed. Besides, Rubberduck knowing about VB_Description attributes has very interesting implications, UX-wise. Even better once we handle line breaks:

CodeExplorerDocstrings

The Solution

Of course there’s a solution. We’ve been chatting about this solution for a long time now. There won’t be subclassed code panes. The VBE is literally fighting with everything it’s got, to keep its scrollbars in place and visible – it’s a better idea to pick one’s battles and drop that idea. So instead we’re going to abuse docked toolwindows and put our entire UI in a single such toolwindow – and that includes the current toolwindows/panels, …as well as the code panes client area.

Once we have our very own code pane user control, what happens in it is under our complete supervision, and we’re free to intervene at any point in the process of code being modified, in any conceivable way.

So the idea would be to bring the attributes themselves into the code panes (optionally hidden), and shave off an entire parser pass just like that. This pretty much makes the whole annotation+inspection+quickfix combo rather useless.

So instead of pursuing a lost cause, I’ve decided to drop my work on getting attributes to synchronize with annotations, and get on with the other 2.1 cycle targets: the attributes problem will just solve itself when we get there.

In the meantime I’d recommend to disable the MissingAnnotation and MissingAttribute inspections. Not that the inspections themselves are harmful, but the SynchronizeModuleAttributes quickfix is: that quickfix is being decommissioned.

There is no worksheet.

Your VBA project is embedded in an Excel workbook. It references the VBA standard library; it references the library that exposes the host application’s (i.e. in this case, Excel’s) object model; it includes global-scope objects of types that are declared in these libraries – like Sheet1 (an Excel.Worksheet instance) and ThisWorkbook (an Excel.Workbook instance). These free, global-scope objects are right here to take and run with.

You’re free to use them, wisely.

True, they’re global – they can be accessed from anywhere in the code.
They can… and that doesn’t mean they should.

And if you’re willing to do whatever it takes to abstract away the host application’s object model in your “business logic”, then you can isolate your logic from the worksheet boilerplate and write pretty much the same code you’d be writing in, say, VB.NET… or any other object-oriented language for that matter.


 

Abstracting Worksheets

There is no worksheet. There is data. Data going in, data going out: data is all it is. When the data is coming from a database, many programmers immediately say “I know! Let’s implement a repository pattern!“, or otherwise come up with various ways to abstract away the data-handling boilerplate. If you think of worksheets as data, then it’s not any different, really.

So we shall treat worksheets as such: data. What do we need to do with this data?

Some tried to make worksheets Implements interfaces, and ran into issues (here too, and oh another). I completely agree with this post, which basically boils down to don’t.

Whatever you do, don’t make worksheets implement an interface.

Wrap them instead. Make a proxy class implement the interface you need, and then make sure everything that needs anything on a worksheet, accesses it through an interface, say IWorkbookData.FooSheet, where FooSheet is a property that returns a FooSheetProxy instance, exposed as an IFooSheet.

Diagram

The only thing that ever needs to use a FooSheet directly, is a FooSheetProxy.

I don’t know about you, but I don’t like diagrams. So how about some real-world code instead?

Say you have an order form workbook, that you distribute to your sales reps and that come back filled with customer order details. Now imagine you need a macro that reads the form contents and imports it into your ERP system.

You could write a macro that reads the customer account number in cell O8, the order date in cell J6, the delivery and cancel dates in cells J8 and J10, and loops through rows 33 to 73 to pull the model codes from column F, and the per-size quantities in columns V through AQ… and it would work.

…Until it doesn’t anymore, because an order came back without a customer account number because it’s a new customer and the data validation wouldn’t let them enter an account that didn’t exist at the time you issued the order form. Or you had to shift all the sized units columns to the right because someone asked if it was possible to enter arbitrary notes at line item level. Or a new category needed to be added and now you have two size scales atop your sized units columns, and you can’t just grab the size codes from row 31 anymore. In other words, it works, until someone else uses (or sees) it and the requirements change.

Sounds familiar?

If you’ve written a script-like god-procedure that starts at the top and finishes with a MsgBox "Completed!" call at the bottom, (because that’s all VBA is good for, right?), then you’re going to modify your code, increase the cyclomatic complexity with new cases and conditions, and rinse and repeat. Been there, done that.

Not anymore.

Name things.

Abstraction is key. Your code doesn’t really care about what’s in cell O8. Your code needs to know about a customer account number. So you name that range Header_AccountNumber, proceed to name all the things, and before you know it you’re looking at Header_OrderDate, Header_DeliveryDate and Header_CancelDate, and then Details_DetailArea and Details_SizedUnits named ranges, you’ve ajusted your code to use them instead of hard-coding cell references, and that’s already a tremendous improvement: now the code isn’t going to break every time something needs to move around.

But you’re still looking at a god-like procedure that does everything, and the only way to test it is to run it: the more complex things are, the less possible it is to cover everything and guarantee that the code behaves as intended in every single corner case. So you download Rubberduck and think “I’m going to write a bunch of unit tests”, and there you are, writing “unit tests” that interact with the real worksheet, firing worksheet events at every change, calculating totals and whatnot: they’re automated tests, but they’re not unit tests. You simply can’t write unit tests against a god-like macro procedure that knows everything and does everything. Abstraction level needs to go further up.

The [Order Form] worksheet has a code-behind module. Here’s mine:

'@Folder("OrderForm.OrderInfo")
Option Explicit

Yup. That’s absolutely all of it. Your princess is in another castle. A “proxy” class implements an interface that the rest of the code uses. The interface exposes everything we’re going to need – something like this:

'@Folder("OrderForm.OrderInfo")
Option Explicit

Public Property Get AccountNumber() As String
End Property

Public Property Get OrderNumber() As String
End Property

'...

Public Function CreateOrderHeaderEntity() As ISageEntity
End Function

Public Function CreateOrderDetailEntities() As VBA.Collection
End Function

Public Sub Lockdown(Optional ByVal locked As Boolean = True)
End Sub

Then there’s a proxy class that implements this interface; the AccountNumber property implementation might look like this:

Private Property Get IOrderSheet_AccountNumber() As String
    Dim value As Variant
    value = OrderSheet.Range("Header_Account").value
    If Not IsError(value) Then IOrderSheet_AccountNumber = value
End Property

And then the CreateOrderHeaderEntity creates and returns an object that your “import into ERP system” macro will consume, using the named ranges defined on OrderSheet. Now instead of depending directly on OrderSheet, your macro depends on this OrderSheetProxy class, and you can even refactor the macro into its own class and make it work against an IOrderSheet instead.

What gives? Well, now that you have code that works off an IOrderSheet interface, you can write some OrderSheetTestProxy implementation that doesn’t even know or care about the actual OrderSheet worksheet, and just like that, you can write unit tests that don’t use any worksheet at all, and still be able to automatically test the entire set of functionliaties!


Of course this isn’t the full picture, but it gives an idea. A recent order form project of mine currently contains 86 class modules, 3 standard modules, 11 user forms, and 25 worksheets (total worksheet code-behind procedures: 0) – not counting anything test-related – and using this pattern (combined with MVP), the code is extremely clear and surprisingly simple; most macros look more or less like this:

Public Sub AddCustomerAccount()
    Dim proxy As IWorkbookData
    Set proxy = New WorkbookProxy
    If Not proxy.SettingsSheet.EnableAddCustomerAccounts Then
        MsgBox MSG_FeatureDisabled, vbExclamation
        Exit Sub
    End If

    With New AccountsPresenter
        .Present proxy
    End With
End Sub

Without abstraction, this project would be a huge unscalable copy-pasta mess, impossible to extend or maintain, let alone debug.

See, there is no worksheet!

UserForm1.Show

I’ve seen these tutorials. You’ve probably seen them too. They all go “see how easy it is?!” when they end with a glorious UserForm1.Show without explaining anything about what it means for your code and your understanding of programming concepts, to use a form’s default instance like this. Most don’t even venture into explaining anything about that default instance – and off you go, see you on Stack Overflow.

Because if you don’t know what you’re doing, all you’ve learned is how to write code that, in the name of “hey look it’s so easy”, abstracts away crucially important concepts that will, sooner or later, come back to bite you in the …rear end.

What’s that default instance anyway?

A UserForm is essentially a class module with a designer and a VB_PredeclaredId attribute. That PredeclaredId means VBA is automatically creating a global-scope instance of the class, named after that class. If the default instance is ever unloaded or set to Nothing, its internal state gets reset, and automatically reinitialized as soon as the default instance is invoked again. You can Set UserForm1 = Nothing all you want, you can never verify whether UserForm1 Is Nothing, because that expression will always evaluate to False. A default instance is nice for, say, exposing a factory method. But please, please don’t Show the default instance.


Doing. It. Wrong.™

There are a number of red flags invariably raised in many UserForm tutorials:

  • Unload Me, or worse, Unload UserForm1, in the form’s code-behind. The former makes the form instance a self-destructing object, the latter destroys resets the default instance, and that’s not necessarily the executing instance – and that leads to all kinds of funky unexpected behavior, and embarrassing duplicate questions on Stack Overflow. Every day.
  • UserForm1.Show at the call site, where UserForm1 isn’t a local variable but the “hey look it’s free” default instance, which means you’re using an object without even realizing it (at least without New​-ing it up yourself) – and you’re storing state that belongs to a global instance, which means you’re using an object but without the benefits of object-oriented programming. It also means that…
  • The application logic is implemented in the form’s code-behind. In programming this [anti-]pattern has a name: the “smart UI”. If a dialog does anything beyond displaying and collecting data, it’s doing someone else’s job. That piece of logic is now coupled with the UI, and it’s impossible to write a unit test for it. It also means you can’t possibly reuse that form for something else in the same project (heck, or for something similar in another project) without making considerable changes to the form’s code-behind. A form that’s used in 20 places and runs the show for 20 functionalities, can’t possibly be anything other than a spaghetti mess.

So that’s what not to do. Flipside.


Doing it right.

What you want at the call site is to show an instance of the form, let the user do its thing, and when the dialog closes, the calling code pulls the data from the form’s state. This means you can’t afford a self-destructing form that wipes out its entire state before the [Ok] button’s Click handler even returns.

Hide it, don’t Unload it.

In .NET’s Windows Forms UI framework (WinForms / the .NET successor of MSForms), a form’s Show method is a function that returns a DialogResult enum value, a bit like a MsgBox does. Makes sense; that Show method tells its caller what the user meant to do with the form’s state: Ok being your green light to process it, Cancel meaning the user chose not to proceed – and your program is supposed to act accordingly.

You see Show-ing a dialog isn’t some fire-and-forget business: if the caller is going to be responsible for knowing what to do when the form is okayed or cancelled, then it’s going to need to know whether the form is okayed or cancelled.

And a form can’t tell its caller anything if clicking the [Ok] button nukes the form object.

The basic code-behind for a form with an [Ok] and a [Cancel] button could look like this:

Option Explicit
'@Folder("UI")
Private cancelled As Boolean

Public Property Get IsCancelled() As Boolean
    IsCancelled = cancelled
End Property

Private Sub OkButton_Click()
    Hide
End Sub

Private Sub CancelButton_Click()
    OnCancel
End Sub

Private Sub UserForm_QueryClose(Cancel As Integer, CloseMode As Integer)
    If CloseMode = VbQueryClose.vbFormControlMenu Then
        Cancel = True
        OnCancel
    End If
End Sub

Private Sub OnCancel()
    cancelled = True
    Hide
End Sub

Notice there are two ways to cancel the dialog: the [Cancel] button, and the [X] button, which would also nuke the object instance if Cancel = True wasn’t specified in the QueryClose handler. Handling QueryClose is fundamental – not doing it means even if you’re not Unload-ing it anywhere, [X]-ing out of the form will inevitably cause issues, because the calling code has all rights to not be expecting a self-destructing object – you need to have the form’s object reference around, for the caller to be able to verify if the form was cancelled when .Show returns.

The calling code looks like this:

With New UserForm1
    .Show
    If Not .IsCancelled Then
        '...
    End If
End With

Notice there’s no need to declare a local variable; the With New syntax yields the object reference to the With block, which properly destroys the object whenever the With block is exited – hence why GoTo-jumping out and then back into a With block is never a good idea; this can happen accidentally, with a Resume or Resume Next instruction in an error-handling subroutine.

The Model

A dialog displays and collects data. If the caller needs to know about a UserName and a Password, it doesn’t need to care about some userNameBox and passwordBox textbox controls: what it cares about, is the UserName and the Password that the user provided in these controls – the controls themselves, the ability to hide them, move them, resize them, change their font and border style, etc., is utterly irrelevant. The calling code doesn’t need controls, it needs a model that encapsulates the form’s data.

LoginForm

In its simplest form, the model can take the shape of a few Property Get members in the form’s code-behind:

Public Property Get UserName() As String
    UserName = userNameBox.Text
End Property

Public Property Get Password() As String
    Password = passwordBox.Text
End Property

Or better, it could be a full-fledged class, exposing Property Get and Property Let members for every property.

The calling code can now get the form’s data without needing to care about controls and knowing that the UserName was entered in a TextBox control, or knowing the Password without knowing that the PasswordChar for the passwordBox was set to *.

Except, it can – form controls are basically public instance fields on the form object: the caller can happily access them at will… and this makes the UserName and Password interesting properties kind of lost in a sea of MSForms boilerplate in IntelliSense. So you implement the model in its own class module instead, and use composition to encapsulate it:

Private viewModel As LoginDialogModel

Public Property Get Model() As LoginDialogModel
    Set Model = viewModel
End Property

Public Property Set Model(ByVal value As LoginDialogModel)
    Set viewModel = value
End Property

The model could be updated by the textboxes – it could even expose Boolean properties that can be used to enable/disable the [Ok] button, or show/hide a validation error icon:

Private Sub userNameBox_Change()
    viewModel.UserName = userNameBox.Text
    ValidateForm
End Sub

Private Sub passwordBox_Change()
    viewModel.Password = passwordBox.Text
    ValidateForm
End Sub

Private Sub ValidateForm()
    okButton.Enabled = viewModel.IsValidModel
    userNameValidationErrorIcon.Visible = viewModel.IsInvalidUserName
    passwordValidationErrorIcon.Visible = viewModel.IsInvalidPassword
End Sub

Now, a problem remains: the caller doesn’t want to see the form’s controls.

The View

So we have a model abstraction that the view can consume, but we don’t have an abstraction for the view. That should be simple enough – let’s add a new class module and define a general-purpose IView interface:

Option Explicit
'@Folder("Abstractions")
'@Interface

Public Function ShowDialog(ByVal viewModel As Object) As Boolean
End Function

Now the form can implement that interface – and because the interface is exposing that ShowDialog method, we don’t need a public IsCancelled property anymore. I’m introducing a Private Type at this point, because I like having only one private field:

Option Explicit
Implements IView
'@Folder("UI")

Private Type TView
    IsCancelled As Boolean
    Model As LoginDialogModel
End Type

Private this As TView

Private Sub OkButton_Click()
    Hide
End Sub

Private Sub CancelButton_Click()
    OnCancel
End Sub

Private Sub UserForm_QueryClose(Cancel As Integer, CloseMode As Integer)
    If CloseMode = VbQueryClose.vbFormControlMenu Then
        Cancel = True
        OnCancel
    End If
End Sub

Private Sub OnCancel()
    this.IsCancelled = True
    Hide
End Sub

Private Function IView_ShowDialog(ByVal viewModel As Object) As Boolean
    Set this.Model = viewModel
    Show
    IView_ShowDialog = Not cancelled
End Function

The interface can’t be general-purpose if the Model property is of a type more specific than Object, but it doesn’t matter: the code-behind gets IntelliSense and early-bound, compile-time validation of member calls against it because the Private viewModel field is an implementation detail, and this particular IView implementation is a “login dialog” with a LoginDialogModel; the interface doesn’t need to know, only the implementation.

The [Ok] button will only ever be enabled if the model is valid – that’s one less thing for the caller to worry about, and the logic addressing that concern is neatly encapsulated in the model class itself.

The calling code is supplying the model, so its type is known to the caller – in fact that Property Get member is just provided as a convenience, because it makes little sense to Set a property without being able to Get it later.

Speaking of the calling code, with the addition of a Self property to the model class (Set Self = Me), it could look like this now:

Public Sub Test()
    Dim view As IView
    Set view = New LoginForm

    With New LoginDialogModel
        If Not view.ShowDialog(.Self) Then Exit Sub
        'consume the model:
        Debug.Print .UserName, .Password
    End With 'model goes out of scope

End Sub 'view goes out of scope

If you read the previous article about writing unit-testable code, you’re now realizing (if you haven’t already) that this IView interface could be implemented by some MockLoginDialog class that implements ShowDialog by returning a test-configured value, and unit tests could be written against any code that consumes an IView rather than an actual LoginForm, so long as you’ve written it in such a way that it’s the calling code that’s responsible for knowing what specific IView implementation the code is going to be interacting with.

The model’s validation logic could be unit-tested, too:

Const value As String = "1234"
With New LoginDialogModel
    .Password = value
    Assert.IsTrue .IsInvalidPassword, "'" & value & "' should be invalid."
End With

With a Model and a View, you’re one step away from implementing the New-ing-up a Presenter class, an abstraction that completes the MVP pattern, a much more robust way to write UI-involving code than a Smart UI is.

How to unit test VBA code?

So Rubberduck lets you write unit tests for your VBA code. If you’re learning VBA, or if you’re a seasoned VBA developer but have never written a unit test before, this can sound like a crazy useless idea. I know, because before I started working on Rubberduck, that’s how I was seeing unit tests: annoying, redundant code that tells you nothing F5/debugging doesn’t already tell you.

Right? What’s the point anyway?

First, it changes how you think about code. Things like the Single Responsibility Principle start becoming freakishly important, and you begin to break that monolithic macro procedure into smaller, more focused chunks. Future you, or whoever inherits your code, will be extremely thankful for that.

But not all VBA code should be unit-tested. Let’s see why.

Know what NOT to test

All code has dependencies. Some of these dependencies we can’t really do without, and don’t really affect anything – global-scope functions in the VBA Standard Library, for example. Other dependencies affect global state, require user input (MsgBox, InputBox, forms, dialogs, etc.) or access external resources – like a database, the file system, …or a worksheet.

For the sake of this article, say you have a simple procedure like this:

Public Sub DoSomething()
    Dim conn As ADODB.Connection
    Set conn = New ADODB.Connection
    conn.ConnectionString = "{connection string}"
    conn.Open
    Dim rs As ADODB.Recordset
    Set rs = conn.Execute("SELECT * FROM dbo.SomeTable")
    Sheet1.Range("A1").CopyFromRecordset rs
    conn.Close
End Sub

The problematic dependencies are:

  • conn, an ADODB connection
  • rs, an ADODB recordset
  • Sheet1, an Excel worksheet

Is that procedure doomed and completely untestable? Well, as is, …pretty much: the only way to write a test for this procedure would be to actually run it, and verify that something was dumped into Sheet1. In fact, that’s pretty much automating F5-debug: it’s an integration test, not a unit test – it’s a test, but it’s validating that all components work together. It’s not useless, but that’s not a unit test.

Refactoring

The procedure needs to be parameterless, because it’s invoked from some button: so we have a major problem here – there’s no way to factor out the dependencies!

Or is there? What if we introduced a class, and moved the functionality into there?

Now we’d be looking at this:

Public Sub DoSomething()
    With New MyTestableMacro
        .Run
    End With
End Sub

At this point we tremendously increased the macro’s abstraction level and that’s awesome, but we didn’t really gain anything. Or did we? Now that we’ve decoupled the macro’s entry point from the implementation, we can pull out the dependencies and unit-test the MyTestableMacro class! But how do we do that?

Think in terms of concerns:

  • Pulling data from a database
  • Writing the data to a worksheet

Now think in terms of objects:

  • We need some data service responsible for pulling data from a database
  • We need some spreadsheet service responsible for writing data to a worksheet

The macro might look like this now:

Public Sub DoSomething()

    Dim dataService As MyDbDataService
    Set dataService = New MyDbDataService

    Dim spreadsheetService As Sheet1Service
    Set spreadsheetService = New Sheet1Service

    With New MyTestableMacro
        .Run dataService, spreadsheetService
    End With

End Sub

Now if we think of MyDbDataService as an interface, we could conceptualize it like this:

Option Explicit
'@Folder "Services.Abstract"
'@Interface IDataService

Public Function GetSomeTable() As Variant
End Function

And if we think of Sheet1Service as an interface, we could conceptualize it like this:

Option Explicit
'@Folder "Services.Abstract"
'@Interface IWorksheetService

Public Sub WriteAllData(ByRef data As Variant)
End Sub

Notice the interfaces don’t know or care about ADODB.Recordset: the last thing we want is to have that dependency in our way, so we’ll be passing a Variant array around instead of a recordset.

Now the Run method’s signature might look like this:

Public Sub Run(ByVal dataService As IDataService, ByVal wsService As IWorksheetService)

Notice it only knows about abstractions, not the concrete implementations. All that’s missing is to make MyDbDataService implement the IDataService interface, and Sheet1Service implement the IWorksheetService interface.

Option Explicit
Implements IDataService
'@Folder "Services.Concrete"

Private Function IDataService_GetSomeTable() As Variant
    Dim conn As ADODB.Connection
    Set conn = New ADODB.Connection
    conn.ConnectionString = "{connection string}"
    conn.Open
    Dim rs As ADODB.Recordset
    Set rs = conn.Execute("SELECT * FROM dbo.SomeTable")
    'dump the recordset onto a temp sheet:
    Dim tempSheet As Excel.Worksheet
    Set tempSheet = ThisWorkbook.Worksheets.Add
    tempSheet.Range("A1").CopyFromRecordset rs
    IDataService_GetSomeTable = tempSheet.UsedRange.Value '2D variant array
    conn.Close
    tempSheet.Delete
End Function

Stubbing the interfaces

So here’s where the magic begins: the macro will definitely be using the above implementation, but nothing forces a unit test to use it too. A unit test would be happy to use something like this:

Option Explicit
Implements IDataService
'@Folder "Services.Stubs"

Private Function IDataService_GetSomeTable() As Variant
    Dim result(1 To 50, 1 To 10) As Variant
    IDataService_GetSomeTable = result
End Function

Public Function GetSomeTable() As Variant
    GetSomeTable = IDataService_GetSomeTable
End Function

You could populate the array with some fake results, expose properties and methods to configure the stub in every way your tests require (depending on what logic needs to run against the data after it’s dumped onto the worksheet) – for this example though all we need is for the method to return a 2D variant array, and the above code satisfies that.

Then we need a stub for the IWorksheetService interface, too:

Option Explicit
Implements IWorksheetService
'@Folder "Services.Stubs"

Private written As Boolean
Private arrayPointer As Long

Private Sub IWorksheetService_WriteAllData(ByRef data As Variant)
    written = True
    arrayPointer = VarPtr(data)
End Function

Public Property Get DataWasWritten() As Boolean
    DataWasWritten = written
End Property

Public Property Get WrittenArrayPointer() As Long
    WrittenArrayPointer = arrayPointer
End Property

Writing the tests

That’s all our test needs for now. See where this is going? DoSomething is using concrete implementations of the service interfaces that actually do the work, and a unit test can look like this:

'@TestMethod
Public Sub GivenData_WritesToWorksheet()
    'Arrange
    Dim dataServiceStub As MyDataServiceStub
    Set dataServiceStub = New MyDataServiceStub
    Dim wsServiceStub As MyWorksheetServiceStub
    Set wsServiceStub = New MyWorksheetServiceStub

    'Act
    With New MyTestableMacro
        .Run dataServiceStub, wsServiceStub
    End With

    'Assert
    Assert.IsTrue wsServiceStub.DataWasWritten
End Sub

If MyTestableMacro.Run invokes IWorksheetService.WriteAllData, this test will pass.

One more:

'@TestMethod
Public Sub WorksheetServiceWorksOffDataFromDataService()
    'Arrange
    Dim dataServiceStub As MyDataServiceStub
    Set dataServiceStub = New MyDataServiceStub
    Dim expected As Long
    expected = VarPtr(dataServiceStub.GetSomeTable)

    Dim wsServiceStub As MyWorksheetServiceStub
    Set wsServiceStub = New MyWorksheetServiceStub

    'Act
    With New MyTestableMacro
        .Run dataServiceStub, wsServiceStub
    End With

    Dim actual As Long
    actual = wsServiceStub.WrittenArrayPointer

    'Assert
    Assert.AreEqual expected, actual
End Sub

If the worksheet service receives the exact same array that the data service returned, this test should pass.

That was a relatively trivial example – the overhead (5 classes, including 2 interfaces and 2 stub implementations) is probably not justified given the simplicity of the task at hand (pull data from a database, dump that data to a worksheet). But hopefully it illustrates a number of things:

  • How to pull dependencies out of the logic that needs to be tested.
  • How to abstract the dependencies as interfaces.
  • How to implement test stubs for these dependencies, and how stubs can expose members that aren’t on the interface, for the tests to consume.
  • How unit tests document what the code is supposed to be doing, through descriptive naming.
  • VBA code can be just as object-oriented as any other code, with full-blown polymorphism and dependency injection.

Next tutorial should be about MSForms.UserForm, how not to use it, and how to test code that needs to pop a dialog. I didn’t mention anything about Rubberduck’s Fakes framework here either, but know that if one of your dependencies is a MsgBox and you have different code paths depending on whether the user clicked [Ok] or [Cancel], you can use Rubberduck’s Fakes API to literally configure how the MsgBox statement is going to behave when it’s invoked by a Rubberduck test.

Rubberduck 2.1.x

The release was going to include a number of important fixes for the missing annotation/attribute inspection and quick-fix, but instead we disabled it, along with a few other buggy inspections, and pushed the release – 7 months after 2.0.13, the last release was now over 1,300 commits behind, and we were reaching a point where we knew a “green release” was imminent, but also a point where we were going to have to make some more changes to parts of the core – notably in order to implement the fixes for these broken annotation/attribute inspections.

So we shipped what we had, because we wouldn’t jeopardize the 2.1 release with parser logic changes at that point.

Crossroads

wooden_signpost_at_the_crossroads1
By Hillebrand Steve, U.S. Fish and Wildlife Service [Public domain], via Wikimedia Commons
So here we are, at the crossroads: with v2.1.0 released, things are going to snowball – there’s a lot on our plates, but we now have a solid base to build upon. Here’s what’s coming:

  • Castle Windsor IoC: hopefully-zero user-facing changes, we’re replacing good old Ninject with a new dependency injection framework in order to gain finer control over object destruction – we will end up correctly unloading!

That’s actually priority one: the port is currently under review on GitHub, and pays a fair amount of long-standing technical debt, especially with everything involving menus.

  • Annotation/Attributes: fixing these inspection, and the quick-fix that synchronizes annotations with module attributes and vice-versa, will finally expose VB module and member attributes to VBA code panes, using Rubberduck’s annotation syntax.

For example,  adding '@Description("This procedure does XYZ") on top of a procedure will tell Rubberduck that you mean that procedure to have a VB_Description attribute; when Rubberduck parses that module after you synchronize, it will be able to use that description in the context status bar, or as tooltips in the Code Explorer.

This is considered a serious issue, because it affects pretty much every single inspection. Luckily there’s a [rather annoying and not exactly acceptable] work-around (apply the fix bottom-to-top in a module), but still.

But there’s a Greater Picture, too.

The 2.1.x Cycle

At the end of this development cycle, Rubberduck will:

  • Work in the VB6 IDE;
  • Have formalized the notion of an experimental feature;
  • Have a working Extract Method refactoring;
  • Make you never want to use the VBE’s Project References dialog ever again;
  • Compute and report various code metrics, including cyclomatic complexity and nesting levels, and others (and yes, line count too);
  • Maybe analyze a number of execution paths and implement some of the coolest code inspections we could think of;
  • Be ready to get really, really serious about a tear-tab AvalonEdit code pane.

If all you’re seeing is Rubberduck’s version check, the next version you’ll be notified about will be 2.1.2, for which we’re shooting for 2017-11-13. If you want to try every build until then (or just a few), then you’ll want to keep an eye on our releases page!

Contributor Swag!

Rubberduck is free, and always will be.

But shipping “thank you” packages to the project’s contributors isn’t. I need your help!

You use Rubberduck? You like what we’re doing? Over the few years the project has been running we’ve received extremely motivating positive feedback, tons of feature requests, and over 420 virtual hugs GitHub stars. And we still get that warm fuzzy every time.

Without Rubberduck’s contributors, we wouldn’t be anywhere near where we are today. There wouldn’t be over 50 implemented code inspections, the parser/resolver core would still be terribly slow and approximate, the Smart Indenter port wouldn’t be completed, there would be nobody to work on and fix the source control integration feature; there wouldn’t be a German translation, a regular expression analyzer tool, and so on: the project’s contributors are the real wizards shaping a dream into a reality… and I want to thank them for that, on behalf of all our users, stargazers and followers.

I can’t do this alone, so I started a GoFundMe campaign to collect enough funds to get some Rubberduck swag made and shipped to the project’s contributors.

gofundme

During the first 24 hours of the campaign, $365 were donated by only 5 contributors – so I decided to match that amount (little bit more actually) and proceeded to order some of the stuff – as of now the campaign is only 30% raised though.

So what’s the actual swag?

The Rubberduck tee has the project’s name and website on the front, and the GitHub repository URL on the back:

The Rubberduck mug has the same assets, on white and black backgrounds. Both the tees and the mugs are ordered from SpreadShirt:

There’s more to the thank you packages, but I’d like to try to keep the rest a surprise… so I’m not saying anything else about it!

Nice! Can I haz a mug?

Simple: contribute to Rubberduck, and help us bring the VBE into this century! …actually, once I have the packages ready to ship (and the funds to ship them!), I’ll know how much overstock I’m left with (if any). Expect the biggest donators to get a little something too, and if there’s still some left I might run a little contest among Rubberduck users and supporters, so… stay tuned!

Inside Rubberduck (pt.2)

https://www.cgtrader.com/free-3d-print-models/art/sculptures/rubber-duck-voronoi-style
“Rubber Duck” – Voronoi Style Free 3D print model by Roman Hegglin

Last time I went over the startup and initialization of Rubberduck, and I said I was going to follow it up with how the parser and resolver work.

Just so happens that Max Dörner, who has pretty much owned the parser and resolver parts of Rubberduck since he joined the project (that’s right – jumped head-first into some of the toughest, most complicated code in the project!), has nicely documented the highlights of how parsing and resolving works.

So yeah, all I did here was type up an intro. Buckle up, you’re in for a ride!

Part 2: Parsing & Resolving

Rubberduck processes the code in all unprotected modules in a five-step process. First, in the parser state Pending, the projects and modules to parse are determined. Then, in the parser state LoadingReferences, the references currently used by the projects, e.g. the Excel object model, and some built-in declarations are loaded into Rubberduck. Following this, the actual processing of the code begins. Between the parser states Parsing and Parsed the code gets parsed into parse trees with the help of Antlr4. Following this, between the states ResolvingDeclarations and ResolvedDeclarations the module, method and variable declarations are generated based on the parse tree. Finally, between the states ResolvingReferences and Ready the parse trees are walked a second time to determine the references to the declarations within the code.

At each state change, an event is fired which can be handled by any feature subscribing to it, e.g. the CodeExplorer, which listens for the state change to ResolvedDeclarations.

A More Detailed Story

The entry point for the parsing process is the ParseCoordinator inside the Rubberduck.Parsingassembly. It coordinates the parsing process and is responsible for triggering the appropriate state changes at the right time, for which it uses a IParserStateManager passed to it. To trigger the different stages of the parsing process, the ParseCoordinator uses a IParsingStageService. This is a facade passed to it providing a unified interface for calling the individual stages, which are all implemented in an individual set of classes. Each has a concurrent version for production and a synchronous one for testing. The latter was needed because of concurrency problems of the mocking framework.

General Logistics

Every parsing run gets executed in fresh background task. Moreover, to always be in a consistent state, we allow only one parsing run to execute at a time. This is achieved by acquiring a lock in a top level method. This top level method is also the point at which any cancellation or unexpected exception will be caught and logged.

The first step of the actual parsing process is to set the overall parser state to Pending. This signals to all components of Rubberduck that we left a fully usable state. Afterwards, we refresh the projects cache on the RubberduckParserState asking the VBE for the loaded projects and then acquire a collection of the modules currenlty present.

Loading References

After setting the overall parser state to LoadingReferences, the declarations for the project references, i.e. the references selected in Tools –> References... , get loaded into Rubberduck. This is done using the ReferencedDeclarationsCollector in the Rubberduck.Parsing.ComReflectionnamespace, which reads the appropriate type libraries and generates the corresponding declarations.

Note that the order in the References dialog determines what procedure or field an identifier resolves to in VBA if two or more references define a procedure or field of the same name. This prioritization is taken into account when loading the references.

Unfortunately, we are currently not able to load all built-in declarations from the type libraries: there are some hidden members of the MSForms library, some special syntax declarations like LBound and everything related to Debug, and aliases for built-in functions like Left, where Leftis the alias for the actual hidden function defined in the VBA type library. These get loaded as a set of hand-crafted declarations defined in the Rubberduck.Parsing.Symbols.DeclarationLoadersnamespace.

Parsing the Code

At the start of the processing of the actual code, the parser state is set to Parsing. However, this time this is achieved by setting the individual modules states of the modules to be parsed and then evaluating the overall state.

Each module gets parsed separately using an individual ComponentParseTask from the Rubberduck.Parsing.VBA namespace, which is powered by the Antlr4 parser generator. The end result is a pair of two parse trees providing a structured representation of the code one time as seen in the VBE and one time as exported to file.

The general process using Antlr is to provide the code to a lexer that turns the code into a stream of tokens based on lexer rules. (The lexer rules used in Rubberduck can be found in the file VBALexer.g4 in the Rubberduck.Parsing.Grammar namespace.) Then this token stream gets processed by a parser that generates a parse tree based on the stream and a set of parser rules describing the syntactic rules of the language. (The VBA parser rules used in Rubberduck can be found in the file VBAParser.g4 in the Rubberduck.Parsing.Grammar namespace. However, there are more specialized rules in the project). The parse tree then consists of nodes of various types corresponding to the rules in the parser rules.

Even when counting the Antlr workflow described above as one step, the actual parsing process in the ComponentParseTask is a multi stage process in itself. This has two reasons: there are precompiler directives in VBA and some information regarding modules is hidden from the user inside the VBE, namely attributes.

The precompiler directives in VBA allow to conditionally select which code is alive. This allows to write code that would only be legal VBA after evaluating the conditional compilation directives. Accordingly, this has to be done before the code reaches the parser. To achieve this, we parse each module first with a specialized grammar for the precompiler directives and then hide all tokens that are dead after the evaluation from the VBA parser, including the precompiler directives themselves, by sending the tokens to a hidden channel in the tokenstream. Afterwards, the dead code is still part of the text representation of the tokenstream by disregarded by the parser.

To cover both the attributes, which are only present in the exported modules, and provide meaningful line numbers in inspection results, errors and the command bar, we parse both the attributes and the code as seen in the VBE code pane into a separate parse tree and save both on the ModuleState belonging to the module on the RubberduckParserState.

One thing of note is that Antlr provides two different kinds of parsers: the LL parser that basically parses all valid input for every not indirectly left-recursive grammar (our VBA grammar satisfies this) and the SLL parser, which is considerably faster but cannot necessarily parse all valid input for all such grammars. Both parsers are guaranteed to yield the same result whenever the parse succeeds at all. Since the SLL parser works for next to all commonly encountered code, we first parse using it and fall back to the LL parser if there is a parser error.

Following the parse, the state of the module is set to Parsed on a successful parse and to ParserError, otherwise. After all modules have finished parsing, the overall parser state is evaluated. If there has been any parser error, the parsing process ends here.

Resolving Declarations

After parsing the code into parse trees, it is time to generate the declarations for the procedures, functions, properties, variables and arguments in the code.

First, the state of all modules gets set to ResolvingDeclarations, analogous to the start of parsing the code. Then the tree walker and listener infrastructure of Antlr is used to traverse the parse trees and generate declarations whenever the appropriate grammar constructs are encountered. This is done inside the implementations of IDeclarationResolveRunner in the Rubberduck.Parsing.VBAnamespace.

Note that there is still some information missing on the declarations at this point that cannot be determined in this first pass over the parse trees. E.g. the supertypes of classes implementing the interface of another class are not known yet and, although the name of the type of each declaration is already known, the actual type might not be known yet. For both cases we first have to know all declarations.

After the parse trees of all modules have been walked, the overall parser state gets set to ResolvedDeclarations, unless there has been an error, which would result in the state ResolverError and an immediate stop of the parsing run.

Resolving References

After all declarations are known, it is possible to resolve all references to these declarations within the code, beit as types, supertypes or in expressions. This is done using the implementations of IReferenceResolveRunner in the Rubberduck.Parsing.VBA namespace.

First, the state of the modules for which to resolve the references gets set to ResolvingReferencesand the overall state gets evaluated. Then the CompilationPasses run. In these the type names found when resolving the declarations get resolved to the actual types. Moreover, the type hierarchy gets determined, i.e. super- and and subtypes get added to the declarations based on the implements statements in the code.

After that, the parse trees get walked again to find all references to the declarations. This is a slightly complicated process because of the various language constructs in VBA. As a side effect, the variables not resolving to any declaration get collected. Based on these, new declarations get created, which get marked as undeclared. These form the basis for the inspection for undeclared variables.

After all references in a module got resolved, the module state gets set to Ready. If there is some error, the module state gets set to ResolverError. Finally, the overall state gets evaluated and the parsing run ends.

Handling State Changes

On each change of the overall state, an event is raised to which other features can subscribe. Examples are the CodeExplorer, which refreshes on the change to ResolvedDeclarations, and the inspections, which run on the change to Ready.

Handling any state change but the two above is discouraged, except maybe for the change to Pending or the error states if done to disable things. The problem with the other states is that they may never be encountered during a parsing run due to optimizations. Moreover, Rubberduck is generally not in a stable state between Pending and ResolvedDeclarations. Features requiring access to references should generally only handle the Ready state.

Events also get raised for changes of individual module states. However, it should be preferred to handle overall state changes because module states change a lot, especially in large projects.

IMPORTANT: Never request a parse from a state change handler! That will cancel the current parse right after the handlers for this state in favor of the newly requested one.

Doing Only What Is Necessary

When parsing again after a successful parsing run, the easiest way to proceed is to throw away all information you got from the last parsing run and start from scratch. However, this is quite wasteful since typically only a few modules change between parsing runs. So, we try to reuse as much information as possible from prior parsing runs. Since our VBA grammar is build for parsing entire modules the smallest unit of reuse of information we can work with is a module.

We only reparse modules that satisfy one of three conditions: they are new, modified, or not in the state Ready. For the first two conditions it should be obvious why we have to reparse such modules. The question is rather how we evaluate these conditions.

To be able to determine whether a module has changed, we save a hash of the code contained in the module whenever the module gets parsed successfully. At the start of the parsing run, we compare the saved hash with the hash of the corresponding freshly loaded component to find those modules with modified content. In addition we save a flag on the module telling us whether the content hash has ever been saved. If this is not the case, the module is regarded as new.

For the third condition the question is rather why we also reparse such modules. The reason is that such modules might be in an invalid state although the content hash had been written in the last parsing run. E.g. they might have encountered a resolver error or they got parsed successfully in the last parsing run, but the parsing run got cancelled before the declarations got resolved. In these cases the content hash has already been saved so that the module is neither considered to be new nor modified. Consequently, it would not be considered for parsing and resolving if only modules satisfying one of the first two conditions were considered. Because of the possibility of such problems, we rather err on the save side and reparse every module that has not reached the success state Ready.

Since reparsing makes all information we previously acquired about the module invalid, we have to resolve the declarations anew for the modules we reparse. Fortunately, the base characteristics of a declaration only depend on the module it is defined in. So, we only have to resolve declarations for those modules that get reparsed. For references the situation is more complicated.

Since all declarations from the modules we reparse get replaced with new ones, all references to them, all super- and subtypes involving the reparsed modules and all variable and method types involving the reparsed modules are invalid. So, we have to re-resolve the references for all modules that reference the reparsed modules. To allow us to know which modules these are we save the information which module references which other modules in an implementation of IModuleToModuleReferenceManager accessed in the ParseCoordinator via the IParsingCacheService facade. This information gets saved whenever the references for all modules have been resolved successfully, even before evaluating the overall parser state.

In addition to the modules that reference modules that got reparsed, we also re-resolve those modules that referenced modules or project references having just been removed. This is necessary because the references might now point to different declarations. In particular, a renamed module is treated as unrelated to the old one. This means that renaming a module looks to Rubberduck like the removal of the old module and the addition of a new module with a new name.

The final optimization in place on a reparse is that we do not reload the referenced type libraries or the special built-in declarations every time. We just reload those we have not loaded before.

Caching and Cache Invalidation

If you have read the previous paragraph, you might have already realized that the additional speed due to only doing what is necessary comes at a cost: various types of cached data get invalid after parsing and resolving only some modules. So we have to remove the data at a suitable place in the parsing process. To achieve this the ParseCoordinator primarily calls different methods from the IParsingCacheService facade handed to it.

In the next sections we will work our way up from cache data for which you would probably seldom realize that we forgot to remove it to data for which forgetting to remove it sends the parser down in flames. After that, we will finish with a few words about refreshing the DeclarationFinder on the RubberduckParserState.

Invalid Type Declarations

The kind of cache invalidation problem you would probably not realize is that the type as which a variable is defined has to be replaced in case it is a user defined class and the class module gets reparsed; it now has a different declaration. This would probably just cause some issues with some inspections because the actual IdentifierReference tying the identifier to the class declaration is not related to the type declaration we save. Fortunately, the TypeAnnotationPass works by replacing the type declaration anyway. So, we just have to do that for all modules for which we resolve references.

Invalid Super- and Subtypes

As mentioned in the section about resolving references, we run a TypeHierarchyPass to determine the super- and subtypes of each class module (and built-in library). After reparsing a module, we have to re-resolve its supertypes. However, we also have to remove the old declaration of the module itself from the supertypes of its subtypes and from the subtypes of its supertypes, which has some further data invalidation consequences. Otherwise, the “Find all Implementations” dialog or the rename refactoring might produce …interesting results for the affected modules.

The removal of the super- and subtypes is performed via an implementation of ISupertypeCleareron all modules we re-resolve, including the modules we reparse, before clearing the state of the modules to be reparsed. Here, a removal of the supertypes is sufficient because everything is wired up such that manipulating the supertypes automatically triggers the corresponding change on the subtypes.

Invalid Module-To-Module References

As with all other reference caches, part of our cache saving which module references which other modules can become invalid when we re-resolve a module; it might just be that the the part referencing another module is gone. Fortunately, the way these references are saved does not depend on the actual declarations. So reparsing alone does not cause problems. This allows us to defer the removal of the module-to-module references to the reference resolver.

Being able to postpone the removal until we resolve references is fortunate because of potential problems with cancellations. We use the module-to-module references to determine which modules need to be re-resolved. If they got removed and the parsing run got cancelled before they got filled again in the reference resolver, we would potentially miss modules we have to re-resolve. Then the user would need to modify the affected modules in order to force Rubberduck to re-resolve them.

To handle this problem, the reference resolver itself has a cache of the modules to resolve, which is only cleared at the very end of its work. This is safe because the reference resolver only ever processes modules for which it can find a parse tree on the RupperduckParserState.

Invalid References

Invalid IndentifierReferences to declarations from previous parsing runs can cause any number of strange behaviors. This can range from selections referring to references that have once been at that line and column but having been removed in the meantime to refactorings changing things they really should not change.

It is rather clear that the references from all modules to be re-resolved should be removed. However, this is not as straightforward as it seems. The problem is that the references live in a collection on the referenced declaration and not in a collection attached to the module whose code is referencing the declaration. In particular, this makes it easy to forget to remove references from built-in declarations. To avoid such issues, we extracted the logic for removing references by a module into implementations of IReferenceRemover, which is hidden behind the IParsingCacheService facade.

Modules And Projects That No Longer Exist

Now we come to the piece where everything falls to pieces if we are not doing our job, modules and projects that get removed from the VBE. The problem is that some functionality like the CodeExplorer has to query information from the components in the VBE via COM Interop. If a component does no longer exist when the information gets queried, the parsing run will die with a COMException and there is little we can do about that. So we have to be careful to remove all declarations for no longer existing components right at the start of the parsing run.

To find out which modules no longer exist, we simply collect all the modules on the declarations we have cached and compare these to the modules we get from the VBE. More precisely, we compare the identifiers we use for modules, the QualifiedModuleNames. This will also find modules that got renamed. Projects are bit more tricky since they are usually treated as equal if their ProjectIds are the same; we save these in the project help file. Thus, we have to take special care for renamed projects. Knowing the removed projects, their modules get added to the removed modules as well.

Removing the data for removed modules and projects is a bit more complicated than for modules that still exist. After their declarations got removed, there is no sign anymore that they ever existed. So, we have to take special care to remove everything in the right order to guarantee that all information is gone already when we erase the declaration; after each step, the parsing run might be cancelled.

The final effect of removing modules is that the modules referencing the removed modules need to be re-resolved. Intuitively one might think that this will always result in a resolver error. However, keep in mind that renaming is handled as removing a module and adding another. Then the references will simply point to the new renamed module. Because of possible cancellations on the way to resolving the references, we immediately set the state of the modules to be re-resolved to ResolvingReferences. This has the effect that they will be reparsed in case of a cancellation.

Note that basically the same procedure is also necessary whenever we reload project references. Accordingly, we do this right after unloading the references, without allowing cancellations in between.

Refreshing the DeclarationFinder

Since declarations and references change in nearly all steps of the parsing process, we have to refresh our primary cached source of declarations, the DeclarationFinder, quite regularly when parsing. Unfortunately, this is a rather computation intensive thing to do; a lot of dictionaries get populated. So, we refresh only if we need to. E.g. we do not refresh after loading and unloading project references in case nothing changed. However, there are two points in each parsing run where we always have to refresh it: before setting the state to ResolvedDeclarations and before evaluating the overall state at the end of the parsing run, which results in the Ready state in the success path.

Refreshing before the change to ResolvedDeclarations is necessary to ensure that removed modules vanish from the DeclarationFinder before the handlers of this state change event run, including the CodeExplorer. We have to refresh again at the end because, from inside the ParseCoordinator, we can never be sure that the reference resolver did not do anything; it has its own cache of modules that need to be resolved.

One optimization done in the DeclarationFinder itself is that some collections are populated lazily, in particular those dealing only with built-in declarations. This saves the time to rebuild the collections multiple times on each parsing run. However, there is a price to pay. The primary users of the DeclarationFinder are the reference resolver and the inspections, both of which are parallelized. Accordingly, it can happen that multiple threads race to populate the collections. This is bad for the performance of the corresponding features. So, we make compromises by immediately populating the most commonly used collections.

Inside Rubberduck (pt.1)

https://www.cgtrader.com/free-3d-print-models/art/sculptures/rubber-duck-voronoi-style
“Rubber Duck” – Voronoi Style Free 3D print model by Roman Hegglin

Maybe you’ve browsed Rubberduck’s repository, or forked it to get a closer look at the source code. Or maybe you didn’t but you’re still curious about how it might all work.

I haven’t written a blog post in quite a long while (been busy!), so I thought I’d start a series that describes Rubberduck’s internals, starting at the beginning.

Part I: Starting up

Rubberduck embraces the Dependency Injection principle: depend on abstractions, not concrete implementations. Hand-in-hand with DI, the Inversion of Control principle describes how all the decoupled pieces come together. This decoupling enables testable code, which is fundamental when your add-in has a unit testing framework feature in any project of that size.

Because RD is a rather large project, instead of injecting the dependencies (and their dependencies, and these dependencies’ dependencies, and so on…) “by hand”, we use Ninject to do it for us.

We configure Ninject in the Rubberduck.Root namespace, more specifically in the complete mess of a class, RubberduckModule. I say complete mess because, well, a couple of things are wrong in that file. How it steals someone else’s job by constructing the menus, for example. Or how it’s completely under-using the conventions Ninject extension. The abstract factory convention is nice though: Ninject will automatically inject a generated proxy type that implements the factory interface – you never need a concrete implementation of a factory class!

The add-in’s entry point is located in Rubberdcuk._Extension, the class that the VBE discovers in the Windows Registry as an add-in to load. This class implements the IDTExtensibility2 interface, which looks essentially like this:

public interface IDTExtensibility2
{
    void OnAddInsUpdate(ref Array custom);
    void OnConnection(object Application, ext_ConnectMode ConnectMode, object AddInInst, ref Array custom);
    void OnStartupComplete(ref Array custom);
    void OnBeginShutdown(ref Array custom);
    void OnDisconnection(ext_DisconnectMode RemoveMode, ref Array custom);
}

The Application object is the VBE itself – the very same VBE object you’d get in VBA from the host application’s Application.VBE property, and there are a number of things to consider in how these methods are implemented, but everything essentially starts in OnConnection and ends in OnDisconnection.

So we first get hold a reference to the precious Application and AddInInst objects that we receive here, but because we don’t want a direct dependency on the VBIDE API throughout Rubberduck, we wrap it with a wrapper type that implements our IVBE interface – same for the IAddIn(yes, we wrapped every single type in the VBIDE API type library; that way we can at least try to make Rubberduck work in VB6):

 var vbe = (VBE) Application; 
 _ide = new VBEditor.SafeComWrappers.VBA.VBE(vbe);
 VBENativeServices.HookEvents(_ide);
 
 var addin = (AddIn)AddInInst;
 _addin = new VBEditor.SafeComWrappers.VBA.AddIn(addin) { Object = this };

Then InitializeAddIn is called. That method looks for the configuration settings file, and sets the Thread.CurrentUICulture accordingly. When we know that the settings aren’t disabling the startup splash, we get our build number from the running assembly and bring up the splash screen. Only then do we call the Startup method; when Startup returns (or throws), the splash screen is disposed.

The method is pretty simple:

private void Startup()
{
    var currentDomain = AppDomain.CurrentDomain;
    currentDomain.AssemblyResolve += LoadFromSameFolder;

    _kernel = new StandardKernel(
        new NinjectSettings {LoadExtensions = true}, 
        new FuncModule(), 
        new DynamicProxyModule());
    _kernel.Load(new RubberduckModule(_ide, _addin));

    _app = _kernel.Get<App>();
    _app.Startup();

    _isInitialized = true;
}

We initialize a Ninject StandardKernel, load our module (give it our IVBE and IAddIn object references), get an App object and call its Startup method, where the fun stuff begins:

public void Startup()
{
    EnsureLogFolderPathExists();
    EnsureTempPathExists();
    LogRubberduckSart();
    LoadConfig();
    CheckForLegacyIndenterSettings();
    _appMenus.Initialize();
    _hooks.HookHotkeys(); // need to hook hotkeys before we localize menus, to correctly display ShortcutTexts
    _appMenus.Localize();

    UpdateLoggingLevel();

    if (_config.UserSettings.GeneralSettings.CheckVersion)
    {
        _checkVersionCommand.Execute(null);
    }
}

The method names speak for themselves: we conditionally hit the registry looking for a legacy Smart Indenter key to import indenter settings from, and run the asynchronous “version check” command, which sends an HTTP request to http://rubberduckvba.com/build/version/stable, a URL that merely returns the version number of the build that’s running on the website: by comparing that version with the running version, Rubberduck can let you know when a new version is available.

That’s literally all there is to it: just with that, we have a backbone to build with. If we want a new command, we just implement an ICommand, and if that command goes into a menu we hook it up to a CommandMenuItem class. Commands often delegate their work to more specialized objects, e.g. a refactoring, or a presenter of some sort.

Next post will dive into how Rubberduck’s parser and resolver work.

to be continued…

2.0.14?

Recently I asked on Twitter what the next RD News post should be about.

next-rdnews-post-survey-results

Seems you want to hear about upcoming new features, so… here it goes!


The current build contains a number of breakthrough features; I mentioned an actual Fakes framework for Rubberduck unit tests in an earlier post. That will be an ongoing project on its own though; as of this writing the following are implemented:

  • Fakes
    • CurDir
    • DoEvents
    • Environ
    • InputBox
    • MsgBox
    • Shell
    • Timer
  • Stubs
    • Beep
    • ChDir
    • ChDrive
    • Kill
    • MkDir
    • RmDir
    • SendKey

As you can see there’s still a lot to add to this list, but we’re not going to wait until it’s complete to release it. So far everything we’re hijacking hooking up is located in VBA7.DLL, but ideally we’ll eventually have fakes/stubs for the scripting runtime (FileSystemObject), ADODB (database access), and perhaps even host applications’ own libraries (stabbing stubbing the Excel object has been a dream of mine) – they’ll probably become available as separate plug-in downloads, as Rubberduck is heading towards a plug-in architecture.

The essential difference between a Fake and a Stub is that a Fake‘s return value can be configured, whereas a Stub doesn’t return a value. As far as the calling VBA code is concerned, that’s nothing to care about though: it’s just another member call:

[ComVisible(true)]
[Guid(RubberduckGuid.IStubGuid)]
[EditorBrowsable(EditorBrowsableState.Always)]
public interface IStub
{
    [DispId(1)]
    [Description("Gets an interface for verifying invocations performed during the test.")]
    IVerify Verify { get; }

    [DispId(2)]
    [Description("Configures the stub such as an invocation assigns the specified value to the specified ByRef argument.")]
    void AssignsByRef(string Parameter, object Value);

    [DispId(3)]
    [Description("Configures the stub such as an invocation raises the specified run-time eror.")]
    void RaisesError(int Number = 0, string Description = "");

    [DispId(4)]
    [Description("Gets/sets a value that determines whether execution is handled by Rubberduck.")]
    bool PassThrough { get; set; }
}

So how does this sorcery work? Presently, quite rigidly:

[ComVisible(true)]
[Guid(RubberduckGuid.IFakesProviderGuid)]
[EditorBrowsable(EditorBrowsableState.Always)]
public interface IFakesProvider
{
    [DispId(1)]
    [Description("Configures VBA.Interactions.MsgBox calls.")]
    IFake MsgBox { get; }

    [DispId(2)]
    [Description("Configures VBA.Interactions.InputBox calls.")]
    IFake InputBox { get; }

    [DispId(3)]
    [Description("Configures VBA.Interaction.Beep calls.")]
    IStub Beep { get; }

    [DispId(4)]
    [Description("Configures VBA.Interaction.Environ calls.")]
    IFake Environ { get; }

    [DispId(5)]
    [Description("Configures VBA.DateTime.Timer calls.")]
    IFake Timer { get; }

    [DispId(6)]
    [Description("Configures VBA.Interaction.DoEvents calls.")]
    IFake DoEvents { get; }

    [DispId(7)]
    [Description("Configures VBA.Interaction.Shell calls.")]
    IFake Shell { get; }

    [DispId(8)]
    [Description("Configures VBA.Interaction.SendKeys calls.")]
    IStub SendKeys { get; }

    [DispId(9)]
    [Description("Configures VBA.FileSystem.Kill calls.")]
    IStub Kill { get; }

...

Not an ideal solution – the IFakesProvider API needs to change every time a new IFake or IStub implementation needs to be exposed. We’ll think of a better way (ideas welcome)…

So we use the awesomeness of EasyHook to inject a callback that executes whenever the stubbed method gets invoked in the hooked library. Implementing a stub/fake is pretty straightforward… as long as we know which internal function we’re dealing with – for example this is the Beep implementation:

internal class Beep : StubBase
{
    private static readonly IntPtr ProcessAddress = EasyHook.LocalHook.GetProcAddress(TargetLibrary, "rtcBeep");

    public Beep() 
    {
        InjectDelegate(new BeepDelegate(BeepCallback), ProcessAddress);
    }

    [UnmanagedFunctionPointer(CallingConvention.StdCall, SetLastError = true)]
    private delegate void BeepDelegate();

    [DllImport(TargetLibrary, SetLastError = true)]
    private static extern void rtcBeep();

    public void BeepCallback()
    {
        OnCallBack(true);

        if (PassThrough)
        {
            rtcBeep();
        }
    }
}

As you can see the VBA7.DLL (the TargetLibrary) contains a method named rtcBeep which gets invoked whenever the VBA runtime interprets/executes a Beep keyword. The base class StubBase is responsible for telling the Verifier that an usage is being tracked, for tracking the number of invocations, …and disposing all attached hooks.

The FakesProvider disposes all fakes/stubs when a test stops executing, and knows whether a Rubberduck unit test is running: that way, Rubberduck fakes will only ever work during a unit test.

The test module template has been modified accordingly: once this feature is released, every new Rubberduck test module will include the good old Assert As Rubberduck.AssertClass field, but also a new Fakes As Rubberduck.FakesProvider module-level variable that all tests can use to configure their fakes/stubs, so you can write a test for a method that Kills all files in a folder, and verify and validate that the method does indeed invoke VBA.FileSystem.Kill with specific arguments, without worrying about actually deleting anything on disk. Or a test for a method that invokes VBA.Interaction.SendKeys, without actually sending any keys anywhere.

And just so, a new era begins.


Awesome! What else?

One of the oldest dreams in the realm of Rubberduck features, is to be able to add/remove module and member attributes without having to manually export and then re-import the module every time. None of this is merged yet (still very much WIP), but here’s the idea: a bunch of new @Annotations, and a few new inspections:

  • MissingAttributeInspection will compare module/member attributes to module/member annotations, and when an attribute doesn’t have a matching annotation, it will spawn an inspection result. For example if a class has a @PredeclaredId annotation, but no corresponding VB_PredeclaredId attribute, then an inspection result will tell you about it.
  • MissingAnnotationInspection will do the same thing, the other way around: if a member has a VB_Description attribute, but no corresponding @Description annotation, then an inspection result will also tell you about it.
  • IllegalAnnotationInspection will pop a result when an annotation is illegal – e.g. a member annotation at module level, or a duplicate member or module annotation.

These inspections’ quick-fixes will respectively add a missing attribute or annotation, or remove the annotation or attribute, accordingly. The new attributes are:

  • @Description: takes a string parameter that determines a member’s DocString, which appears in the Object Browser‘s bottom panel (and in Rubberduck 3.0’s eventual enhanced IntelliSense… but that one’s quite far down the road). “Add missing attribute” quick-fix will be adding a [MemberName].VB_Description attribute with the specified value.
  • @DefaultMember: a simple parameterless annotation that makes a member be the class’ default member; the quick-fix will be adding a [MemberName].VB_UserMemId attribute with a value of 0. Only one member in a given class can legally have this attribute/annotation.
  • @Enumerator: a simple parameterless annotation that commands a [MemberName].VB_UserMemId attribute with a value of -4, which is required when you’re writing a custom collection class that you want to be able to iterate with a For Each loop construct.
  • @PredeclaredId: a simple parameterless annotation that translates into a VB_PredeclaredId (class) module attribute with a value of True, which is how UserForm objects can be used without Newing them up: the VBA runtime creates a default instance, in global namespace, named after the class itself.
  • @Internal: another parameterless annotation, that controls the VB_Exposed module attribute, which determines if a class is exposed to other, referencing VBA projects. The attribute value will be False when this annotation is specified (it’s True by default).

Because the only way we’ve got to do this (for now) is to export the module, modify the attributes, save the file to disk, and then re-import the module, the quick-fixes will work against all results in that module, and synchronize attributes & annotations in one pass.

Because document modules can’t be imported into the project through the VBE, these attributes will unfortunately not work in document modules. Sad, but on the flip side, this might make [yet] an[other] incentive to implement functionality in dedicated modules, rather than in worksheet/workbook event handler procedures.

Rubberduck command bar addition

The Rubberduck command bar has been used as some kind of status bar from the start, but with context sensitivity, we’re using these VB_Description attributes we’re picking up, and @Description attributes, and DocString metadata in the VBA project’s referenced COM libraries, to display it right there in the toolbar:

docstrings-in-rdbar.PNG

Until we get custom IntelliSense, that’s as good as it’s going to get I guess.


TokenStreamRewriter

As of next release, every single modification to the code is done using Antlr4‘s TokenStreamRewriter – which means we’re no longer rewriting strings and using the VBIDE API to rewrite VBA code (which means a TON of code has just gone “poof!”): we now work with the very tokens that the Antlr-generated parser itself works with. This also means we can now make all the changes we want in a given module, and apply the changes all at once – by rewriting the entire module in one go. This means the VBE’s own native undo feature no longer gets overwhelmed with a rename refactoring, and it means fewer parses, too.

There’s a bit of a problem though. There are things our grammar doesn’t handle:

  • Line numbers
  • Dead code in #If / #Else branches

Rubberduck is kinda cheating, by pre-processing the code such that the parser only sees WS (whitespace) tokens in their place. This worked well… as long as we were using the VBIDE API to rewrite the code. So there’s this part still left to work out: we need the parser’s token stream to determine the “new contents” of a module, but the tokens in there aren’t necessarily the code you had in the VBE before the parse was initiated… and that’s quite a critical issue that needs to be addressed before we can think of releasing.


So we’re not releasing just yet. But when we do, it’s likely not going to be v2.0.14, for everything described above: we’re looking at v2.1 stuff here, and that makes me itch to complete the add/remove project references dialog… and then there’s data-driven testing that’s scheduled for 2.1.x…

To be continued…

Bubbly Run-Time Errors

300 feet below the surface, in a sunken wreck from another age, a rotting wooden deck silently collapses under the weight of heavy cast iron canons. As the sea floor becomes a thick cloud of millennial dust, the weaponry cracks a cask of over-aged priceless wine, and a tiny amount of air, trapped centuries ago, is freed. Under the tremendous, crushing pressure of the oceanic bottom, the bubbles are minuscule at first. As the ancestral oxygen makes its final journey from the bottom of the ocean up to the surface, the bubbles grow in size with the decreasing pressure – and when it finally reaches its destination to blend with the contemporary atmosphere, it erupts with a bubbly “plop” as it releases itself from the water that held it quietly imprisoned all these years.

Uh, so how does this relate to code in any way?

Bubbles want to explode: the same applies to most run-time errors.

When an error is raised 300 feet down the call stack, it bubbles up to its caller, then to the caller of that caller, and so on until it reaches the entry point – the surface – and blows everything up. When the error is unhandled at least.

And so they told you to handle errors. That every procedure must have an event handler.

Truth is, this is utter cargo-cultist BS. Not every procedure must handle every error. Say you have an object that’s responsible for setting up an ADODB Connection, parameterizing some SQL Command on the fly, and returning a Recordset. You could handle all errors inside that class, trap all the bubbles, and return Nothing instead of a result when something goes wrong. Neat huh? Yeah. Until the caller wants to know why their code isn’t working. That SqlCommand class cannot handle everything: errors need to bubble up to the calling code, for the calling code to handle.

The calling code might be another class module, with a function responsible for – I don’t know – pulling a list of products from a database and returning an array of strings that this function’s own caller uses to populate a ComboBox control, in a UserForm’s Initialize handler. So the data service class lets SqlCommand errors bubble up to its own caller; the UserForm’s Initialize handler receives the error, understands that it won’t be able to populate its ComboBox, and in response decides to go up in flames by bubbling up the error to its own caller – some parameterless procedure in a Macros module, that was called when the user clicked a nicely formatted shape on a dedicated worksheet.

That’s the entry pointThat is where the bubbling stops. That procedure was responsible for bringing up a form for the user to enter some data, but something happened (the detailed information is in the Err object) and we can’t do that now – so we abort the form and display a nice user-friendly message in a MsgBox instead, and we can even send the actual error details into a new Outlook email to helpdesk@contoso.com.

Getting a grip on the handle

Most errors aren’t handled where they’re raised. Well, some are, obviously. But to say that every procedure should have its error handler is just as blatantly wrong as saying no procedure should ever have any error handler: “only a Sith deals in absolutes”.

So which errors should be killed on-the-spot, and which errors should be allowed to bubble up?

Avoidable errors

The vast majority of run-time errors occur due to lack of proper input validation code: we take a value and assume it’s of a type we’re expecting, or at least one we can work with. We assume its format, we assume its location, we assume …lots of things. The more assumptions code makes, the more error-prone it is. Problem is, we don’t always realize all the assumptions we make – and that’s when run-time errors come and bite us. These are completely avoidable errors: they shouldn’t be handled at all, for they are bugs. And we want bugs to blow things up. So if you have code making assumptions – for example a row number is never going to be zero – then you have bugs that are easy to fix (and that a good unit test coverage should prevent, BTW)… and it boils down, mostly, to proper input validation. Avoiding avoidable errors is the #1 best bug-preventing thing you can do.

Of course this supposes the assumptions we make are conscious ones – sometimes, code makes assumptions we don’t realize we’re making. For example, VBA code that implicitly refers to the active workshseet, often assumes that the active sheet is one specific sheet:

foo = Sheet1.Range(Cells(i, j), Cells(i, j)).Value

The above code assumes Sheet1 is active, because the two unqualified Cells calls implicitly refer to the active worksheet. Avoidable. If foo is declared as a String and Sheet1 is active, that same code will still blow up if the cell contains a #VALUE! error. Assumptions are very easy to make! Fortunately they’re also easy to avoid.

Errors you know how to handle

Sometimes you’ll run code that can raise an error even if you’ve validated all inputs – if the SQL server is down, trying to connect to it will blow up your code if you don’t handle that situation. Or the user might not be authorized to run the SQL command, or whatever. The decision on whether to handle in on-the-spot or bubbling it up to the caller, depends on how well you’ve split the responsibilities among your modules and procedures: a utility function usually has no business handling/swallowing its own errors. And unless you’re running the current [not yet released] 2.0.14.x Rubberduck build, your unit tests can’t mock up /fake a MsgBox call, so you have code paths that cannot be cleanly tested.

Looking at it from the client code’s perspective is how you’re going to know what kind of errors and “bad result” outputs you want to be dealing with. And if that client code is a unit test, then you’re probably doing the right thing, whatever that is.

Other times you’ll run into an error, but you know you can simply, gracefully and usefully recover from that error, and resume normal execution – these errors, if they can’t be avoided, should be the kind to handle on-the-spot.

Everything else

For everything else, you’ll want bubbles. Not all the way up though – you’ll want to catch them before they surface and pop in the user’s face! But if your code validates all inputs and makes little or no assumptions, and handles the specific errors you know could happen because roses are red and violets are blue… at the top of every call stack there should be a catch-all handler – an ultimate bubble catcher, that gracefully handles everything other code had to let through.


So…

Rubberduck is never going to tell you to sprinkle error-handling code everywhere. But I think we could have an inspection that warns you if you have a [possible] entry point that lets run-time errors bubble up unhandled.

What do you think? What else can Rubberduck do for you? Should Rubberduck treat any object-returning method as potentially returning Nothing, and suggest that you validate the method’s return value? You would right-click any Range.Find call, and if the returned reference is never compared against Nothing then Rubberduck could introduce an If block that does just that, making the rest of the code path safe to execute in the case of a failing call. Just thinking out loud here…