maintainability – Rubberduck News

State as a Service

August 31, 2024August 31, 2024 Rubberduck VBA2 Comments

Say you have a worksheet that contains a table with various settings, or options to run some macro with. Whatever the macro does, whatever it uses these values for, it must somehow solve this problem: how to get these values out of the worksheet and into the program?

We have a ListObject to play with, and since the table is in a specific worksheet, that’s the worksheet module we’re going to be editing. Since the very existence of that table matters for the rest of whatever this macro ends up doing, we’re going to make it clear it’s not an accident by making it a property of the worksheet (class) module – something like this:

Private Const TableName As String = "Table1"

Public Property Get SettingsTable() As ListObject
  Static Value As ListObject
  If Value Is Nothing Then
    On Error Resume Next
      Set Value = Me.ListObjects(TableName)
    On Error GoTo 0
  End If
  Set SettingsTable = Value
End Property

Property Get procedures usually do not raise an error – here if the table doesn’t exist the property will return Nothing, which should cause the calling code to blow up with error 91, and this would be reasonably expected behavior in this case. The Static local stands in for a module-scoped variable declaration that would be needed if the property had a setter procedure; because it’s only needed in one place, we can declare it locally and retain the module-scoped behavior with the Static keyword.

“Static” in VBA basically means “shared”, and “Shared” in VB.NET means what “static” means in essentially every other language with such semantics. Static as a scope modifier in .NET means a member belongs to the type (as opposed to an instance of the type in question), but in Classic-VB it is used for declaring local variables that retain their value between procedure calls, and if used as a modifier at procedure level it makes all locals behave as such… which [very likely] isn’t a good idea.

So we have now given ourselves access to the table, and we can just do Sheet1.SettingsTable to access it from anywhere.

But what if we don’t want that? If we know the settings each have a unique name and have a value that might be a String, a Double, a Date, or a Boolean. The table might just as well be empty for now, anyway.

If we don’t formalize access to the settings, then every place that needs them might be doing things differently! Imagine the chaos if sometimes an option is retrieved with Application.VLookup, elsewhere with a loop over SettingsTable.Rows, and then another one could be getting the value with an offset from the result of SettingsTable.DataBodyRange.Find, and there’s another couple of different but not always equivalently dangerous or misguided ways to go about retrieving a value from that table, and there’s no need to have all of them around.

We could write a function that accepts the name of a setting and returns a Variant holding the associated value if it exists, but what if we need to get the value multiple times, we’re going to read it off the worksheet every time?

Writing to a Range is perhaps the most expensive thing a macro can do, but reading anything from one comes close. In fact, a macro that performs well is usually a macro that limits its interactions with worksheets and the …entire Excel object model.

What we want is a method that iterates the table rows once, yielding an OptionValue class instance for each setting value. So we add this new class module and define an OptionValue class with a Name (String) and a Value (Variant) property.

But then if we call that method every time we want to get a setting value, things are going to be much worse than if we just used Application.VLookup every time, so what gives?

Grabbing every row of that table and turning them into as many OptionValue instances is an action, it wants to be a verb. Things we do with any kind of state are very often well described with a method’s name that starts with a verb, and that’s great already (especially if the rest of the name is actually somewhat descriptive), but a free-floating verb is up for grabs by anyone.

So we’re going to encapsulate it by making that method a member of a class for which SettingsService might be a good name: it’s a service that has the means to abstract away the worksheet and only expose OptionValue objects, and with it the rest of the code no longer needs to deal with the nitty-gritty details of how these objects come into existence, or how long they’ve been around.

Because we can write this class in such a way that we read the settings once from the worksheet (say, in the Initialize handler of the class itself), cache them in a private keyed collection (or private dictionary), and as long as our instance is alive we can return the cached option values whenever someone asks for them, and then they’ll be getting them without needing to hit the worksheet.

By adding an indexed property, we can even have a default member that makes sense, and the rest of the code can read its configuration like this:

Dim Settings As SettingsService
Set Settings = New SettingsService

If Settings("SomeSetting") Then
  Debug.Print "SomeSetting is ON"
Else
  Debug.Print "SomeSetting is OFF"
End If

Settings(“SomeSetting”) stands in for so many things here, all of which would distract from what this macro is supposed to be doing, however working with Variant like this is annoying, and the use of default members is abstracting away mechanics that we’d usually rather be explicit about, so we should instead expose typed methods, so we know (we/us, but also the compiler and Rubberduck here) what actual types and members we’re dealing with:

Dim Settings As SettingsService
Set Settings = New SettingsService

If Settings.GetBoolean("SomeSetting") Then
  Debug.Print "SomeSetting is ON"
Else
  Debug.Print "SomeSetting is OFF"
End If

Where GetBoolean being a method/function rather than a property should make us feel much better about throwing errors: if the setting doesn’t exist, we blow up. If the setting exists but its Variant subtype isn’t Boolean then we probably want to blow up rather than return gibberish. If it exists and it’s of the expected data type, we return the setting value, converted to an actual Boolean.

That means SettingsService also needs GetDate, GetDouble (and maybe GetInteger), and of course GetString, leaving the Variant values completely encapsulated in the service: callers don’t need to care about any of that, and that’s neat.

There is no Worksheet

The only thing that needs to do anything with SettingsSheet is the SettingsService. Nothing else needs to access it for any reason whatsoever, because we have a service that fully abstracts it away, so it might as well not be there.

And the macro should still work, assuming it knows how to deal with and recover from a missing setting value.

Settings could be moved to a flat file, another workbook, or a database, and only one method would need to change: the one that’s reading and caching the settings from the worksheet, would instead be connecting and querying a database – and would still only need to hit it once.

And none of anything else would need to change, because it’s all completely yielding that responsibility to this service.

Compare to what it would be like to change inline VLookups and Range.Find calls (wherever they are) to read from another source, and you can quickly see the benefits of having sane abstraction levels.

Code that desperately wants to control how everything is done at the lowest level of detail, is tedious and heavy. It’s hard to tell what the role of such a procedure is, because too many things are going on and the signal gets drowned in noise.

There doesn’t need to be a service class, or even an OptionValue class: any distinct procedure scope that’s clearly responsible for retrieving a valid setting value is a good step forward. But moving the state into an object makes it easier to control its lifetime, and by encapsulating behavior we clean up the calling modules the same way extracting procedure out of a larger scope cleans up that larger scope. It reduces the cognitive load and complexity by moving away code that’s concerned with the peripherals of any given macro’s purpose, and thus increases the cohesion of the macro’s module because things that aren’t directly related to what the macro is specifically responsible for, are simply elsewhere, where they belong.

Declaring and Using Variables in VBA

May 7, 2023October 10, 2023 Rubberduck VBA5 Comments

Among the very first language keywords one comes across when learning VBA, is the Dim keyword; declaring and using variables is easily the first step one takes on their journey away from the macro recorder.

About Scopes

Before we can really understand what variables do and what they’re useful for, we need to have a minimal grasp of the concept of scoping. When you record a macro, the executable instructions get written for you inside a procedure scope that’s delimited with Sub and End Sub tokens (tokens are the grammatical elements of the language, not necessarily single keywords), with the identifier name of the macro after the Sub keyword:

Sub DoSomething()
    ' executable code goes here
End Sub

Exactly none of the above code is executable, but compiling it creates an entry point that the VBA runtime can invoke and execute, because the procedure is implicitly public and as such, can be accessed from outside the “Module1” module it exists in (with or without Option Private Module). In other words the above code could tell us explicitly what the scope of the DoSomething procedure is, using the Public keyword before the Sub token:

Public Sub DoSomething()
    ' executable code goes here
End Sub

If we used Private instead, then Excel (or whatever the host application is) could not “see” it, so you would no longer find DoSomething in the list of available macros, and other modules in the same VBA project couldn’t “see” or invoke it either; a private procedure is only callable from other procedures in the same module.

Standard modules are themselves public, so you can refer to them from any other module in your project, and invoke their public members using the member access operator, the dot:

Public Sub DoStuff()
   Module1.DoSomething
End Sub

Because public members of public modules become part of a global namespace, the public members can be referred to without an explicit qualifier:

Public Sub DoStuff()
    DoSomething
End Sub

While convenient to type, it also somewhat obscures exactly what code is being invoked: without an IDE and a “navigate to definition” command, it would be pretty hard to know where that other procedure is located.

The global namespace contains not only the public identifiers from your VBA project, but also all the public identifiers from every referenced library, and they don’t need to be qualified either so that’s how you can invoke the VBA.Interaction.MsgBox function without qualifying with the library or module it’s defined in. If you write your own MsgBox function, every unqualified MsgBox call in that project is now invoking that new custom function, because VBA always prioritizes the host VBA project’s own type library over the referenced ones (every VBA project references the VBA standard library and the type library that defines the COM extension and automation model for the host application).

But that’s all going outward from a module: within a module, there are two levels of scoping: module level members can be accessed from anywhere in the module, and procedure level declarations can be accessed from anywhere inside that procedure.

Module-level declarations use Public and Private modifiers, and procedure-level ones use the Dim keyword. Dim is legal at module level too, but because Private and Public are only legal at module level (you can’t use them for procedure scope / “local” declarations), Rubberduck encourages you to use Dim for locals only.

For example a variable declared in a conditional block is allocated when the stack frame is entered regardless of the state when the condition gets evaluated, and a variable declared inside a loop body is the same variable outside that loop, and for every iteration of that loop as well: there is no “block scope” in VBA.

Non-Executable Statements

Procedures don’t only contain executable instructions: Dim statements, like statements with Private and Public modifiers, are declarative and do not do anything. You cannot place a debugger breakpoint (F9) on such statements, either. This is important to keep in mind: the smallest scope in VBA is the procedure scope, and it includes the parameters and all the local declarations of that procedure – regardless of where in the procedure body they’re declared at, so the reason to declare variables as you need them has more to do with reducing mental load and making it easier to later extract a method by moving a chunk of code into another procedure scope. Declaring all locals at the top of a procedure often results in unused variables dangling, because of the constant up-and-down, back-and-forth scrolling that inevitably happens when a procedure eventually grows; the further a variable is out of its context, the more it becomes a liability.

Const statements (to declare constant values) are also legal in local/procedure scope, and they’re identically non-executable; the same applies to Static declarations (variables that retain their value between invocations).

ReDim statements however are executable, even though they also count as a compile-time declaration – but they don’t count as a duplicate declaration, so the presence of ReDim doesn’t really justify skipping an initial Dim declaration.

Explicitness as an Option

Not only access modifiers can be implicit in VBA; the language lets you define a Variant variable on the fly, without a prior explicit declaration. If this behavior is practical for getting the job done and will indeed work perfectly fine, it’s also unnecessarily putting you at risk of typos that will only become a problem at run-time, if you’re lucky close enough to the source of the problem to hunt down and debug. By specifying Option Explicit at the top of every module, the compiler will treat implicit declarations as compile-time errors, telling you about the problem before it even becomes one.

Option Explicit has its limits though, and won’t protect you from typos in late-bound member calls, where invoking a member that doesn’t exist on a given object throws error 438 at run-time.

When to Declare a Variable

There are many reasons to declare a variable, but if you’re cleaning up macro recorder code the first thing you’ll want to do is to remove the dependency on Selection and qualify Range and Cells member calls with a proper Worksheet object.

For example before might look like this:

Sub Macro1
    Range("A10") = 42
    Sheet2.Activate
    Range("B10") = 42
End Sub

And after might look like this:

Public Sub Macro1()
    Dim Sheet As Worksheet
    Set Sheet = ActiveSheet
    Sheet.Range("A10") = 42
    Sheet2.Activate
    Sheet.Range("B10") = 42
End Sub

The two procedures do exactly the same thing, but only one of them is doing it reliably. If the Sheet2 worksheet is already active, then there’s no difference and both versions produce identical output. Otherwise, one of them writes to whatever the ActiveSheet is, activates Sheet2, and then writes to that sheet.

There’s a notion of state in the first snippet that adds to the number of things you need to track and think about in order to understand what’s going on. Using variables, exactly what sheet is active at any point during execution has no impact whatsoever on the second snippet, beyond the initial assignment.

It’s that (global) state that’s behind erratic behavior such as code working differently when you leave it alone than when you step through – especially when loops start getting involved. Managing that global state makes everything harder than necessary.

Keep your state close, and your ducky closer, they say.

Set: With or Without?

Not being explicit can make the code read ambiguously, especially when you consider that objects in VBA can have default members. In the above snippets, the value 42 reads like it’s assigned to… the object that’s returned by the Range property getter of the Worksheet class. And that’s weird, because normally you would assign to a property of an object, not the object itself. VBA understands what it needs to do here, because the Range class says “I have a default member!” and that default member is implemented in such a way that giving it the value 42 does exactly the same as if the Range.Value member was being invoked explicitly. Because that behavior is an implementation detail, it means the only way to know is to read its documentation.

The Set keyword modifies an assignment instruction and says “we’re assigning an object reference”, so VBA doesn’t try to check if there’s a default member on the left-hand side of the assignment operator, and the compiler expects an object reference on the right-hand side, …and then only throws at run-time when that isn’t the case – but because this information is all statically available at compile-time, Rubberduck can warn about such suspicious assignments.

So to assign a variable that holds a reference to a Range object, we must use the Set keyword. To assign a variable that holds the value of a Range object, we must not use the Set keyword. Declaring an explicit data type for every variable (meaning not only declaring things, but also typing them) helps prevent very preventable bugs and subtle issues that can be hard to debug.

As SomethingExplicit

Whether Public or Private, whether local or global, most variables are better off with a specific data type using an As clause:

Dim IsSomething
Dim SomeNumber As Long
Dim SomeAmount As Currency
Dim SomeValue As Double
Dim SomeDateTime As Date
Dim SomeText As String
Dim SomeSheet As Worksheets
Dim SomeCell As Range

Using an explicit data/class/interface type, especially with objects, helps keep things early-bound, meaning both the compiler and static code analysis tools (like Rubberduck) can better tell what’s going on before the code actually gets to run.

We can often chain member calls; the Worksheets collection’s indexer necessarily yields a Worksheet object, no?

Public Sub Macro1()
    ActiveWorkbook.Worksheets("Sheet1").Range("A1").Value = 42
End Sub

If you manually type this instruction, you’ll notice something awkward that should be unexpected when you type the dot operator after Worksheets(“Sheet1”), because the property returns an Object interface… which tells VBA it has members that can be invoked, but leaves no compile-time clue about any of them. That’s why the Range member call is late-bound and only resolved at run-time, and because the compiler has no idea what the members are until the code is running, it cannot populate the completion list with the members of Worksheet, and will merrily compile and attempt to invoke a Range member.

By breaking the chain and declaring variables, we restore compile-time validations:

Public Sub Macro1()
    Dim Sheet As Worksheet
    Set Sheet = ActiveWorkbook.Worksheets("Sheet2")
    Sheet.Range("A1").Value = 42
End Sub

When NOT to Declare Variables

Variables are so nice, sometimes we declare them even when we don’t need them. There are many valid reasons to use a variable, including abstracting the result of an expression behind its value. Assuming every variable is assigned and referenced somewhere, there are still certain variables that are always redundant!

Objects are sneaky little things… not only can they have a default member that gets implicitly invoked, they can also have a default instance that lives in the global scope and is always named after the class it’s an instance of.

Declaring a local variable to hold a copy of a reference to an object that’s already globally accessible, is always redundant! Document modules (in Excel that’s ThisWorkbook and the Worksheet modules) and UserForms always have such a default instance:

Public Sub Macro1()
    Dim WB As Workbook
    Set WB = ThisWorkbook 'redundant and obscures intent!
    Dim Sheet As Worksheet
    Set Sheet = Sheet1 'redundant, just use Sheet1 directly!
End Sub

Sprinkle Generously

Variables are a simple but powerful tool in your arsenal. Using them enhances the abstraction level of your code, practices your brain to stop and think about naming things, can help prevent binding errors and remove implicit late-binding / keep your code entirely visible to the compiler and Rubberduck. Used wisely, variables can make a huge difference between messy and redundant macro-recorder code and squeaky-clean, professionally-written VBA code.

“I’m not a programmer”

July 11, 2019 Rubberduck VBA7 Comments

If you’re an accountant, a sales analyst, or in any other office position where writing VBA code helps you do your job faster, you may have uttered the words “I’m not a programmer” before, and that wouldn’t have been wrong: once the code is written, you’d tweak it every once in a while to fix a bug here or there, and then move on to do your actual job as soon as things look like they work as they should. If you’re finding that as weeks and months pass, you’re spending more and more time debugging that code, it might pay off to learn a bit more about how “actual programmers” do things, but be warned: it’s a bit of a rabbit hole – in a good way, but still, it goes as deep as you’re willing to go.

If you’re a consultant delivering advanced Excel solutions to your business clients, you may have thought, said, or written the same words, too. But you like your worksheets efficient, flexible, reliable, easy to maintain; over the years you’ve become an expert at dissecting and modeling a business problem into a fine solution that will live on and grow with the client. You are a “power user”, a professional, and it shows.

Writing code involves exactly the same identical problem-solving thought process: dissecting a problem into small steps, and modeling it into …a sequence of executable statements. When you use a SUM function in a worksheet, you assess whether the range of cells you’re adding up will need to grow over time, and you make sure it’s as simple as possible to add or insert a new row without breaking the integrity of the worksheet: you’re not just solving the problem at hand, you’re anticipating the extension points, facilitating them, making it harder to break things. You shade the background of cells intended for data entry, use borders around fields, conditional formatting, and data validation to ensure everything is obvious and remains consistent; you source data validation lists from named ranges pointing to a column in a table, so that adding new possible valid entries is easy as pie and requires no other step than… adding the new possible valid entries in the table. Next to another table that requires a particular sort order because it’s being used by dynamic named ranges that source data validation lists for co-dependent dropdowns, you might put up a very obvious formatted shape with an inner text that explains why that table needs to be sorted by this column then that column, and what happens to the associated validation dropdowns when the sort is broken.

See, you are a programmer. Worksheets are programs – even more intensely so with the amazing new features coming to Excel: dynamic arrays are changing the entire paradigm and turning the very thinking of worksheet functions into something that really isn’t very far from the mindset you’d have in functional programming.

So why is it that VBA code is so often seen as merely a sequence of executable statements then? Why is it that “it works, therefore it’s good enough” is so often where the bar is? You could have made that data validation list work off a standard range of cells in some (hidden?) column somewhere off-screen, and that would have worked too… but “well, it works” isn’t where the bar should be at, and you know it.

Depending on what the code needs to do, VBA code can become much more than just a macro once you start not just solving the problem at hand, but also anticipating the extension points, facilitating them, making it easier to maintain, and harder to break things. If writing code is part of what you do for a living, then you might as well write good code. Good code isn’t just code that works. It’s code that adheres to a number of language-agnostic principles and modern-day guidelines that even plain procedural code should follow. It’s code that Joe from accounting probably couldn’t have written by themselves, but that they could likely read and understand (at least at the higher abstraction levels), and if they know enough VBA to be dangerous, they could likely even maintain and extend it!

I’m not saying every piece of VBA code needs OOP and dependency injection and inversion of control and 20 class modules with 1 method each ought to get involved in sorting a ListObject. Just that maybe, VBA code would be a little less dreadful to the eventual IT staff inheriting it, if instead of saying “I’m not a programmer”, we cared about the quality of our code in the exact same way we care about the quality of our worksheets and dashboards, or Access databases and reports, or whatever it is that we’re doing.