August 2024 – Rubberduck News

State as a Service

August 31, 2024August 31, 2024 Rubberduck VBA2 Comments

Say you have a worksheet that contains a table with various settings, or options to run some macro with. Whatever the macro does, whatever it uses these values for, it must somehow solve this problem: how to get these values out of the worksheet and into the program?

We have a ListObject to play with, and since the table is in a specific worksheet, that’s the worksheet module we’re going to be editing. Since the very existence of that table matters for the rest of whatever this macro ends up doing, we’re going to make it clear it’s not an accident by making it a property of the worksheet (class) module – something like this:

Private Const TableName As String = "Table1"

Public Property Get SettingsTable() As ListObject
  Static Value As ListObject
  If Value Is Nothing Then
    On Error Resume Next
      Set Value = Me.ListObjects(TableName)
    On Error GoTo 0
  End If
  Set SettingsTable = Value
End Property

Property Get procedures usually do not raise an error – here if the table doesn’t exist the property will return Nothing, which should cause the calling code to blow up with error 91, and this would be reasonably expected behavior in this case. The Static local stands in for a module-scoped variable declaration that would be needed if the property had a setter procedure; because it’s only needed in one place, we can declare it locally and retain the module-scoped behavior with the Static keyword.

“Static” in VBA basically means “shared”, and “Shared” in VB.NET means what “static” means in essentially every other language with such semantics. Static as a scope modifier in .NET means a member belongs to the type (as opposed to an instance of the type in question), but in Classic-VB it is used for declaring local variables that retain their value between procedure calls, and if used as a modifier at procedure level it makes all locals behave as such… which [very likely] isn’t a good idea.

So we have now given ourselves access to the table, and we can just do Sheet1.SettingsTable to access it from anywhere.

But what if we don’t want that? If we know the settings each have a unique name and have a value that might be a String, a Double, a Date, or a Boolean. The table might just as well be empty for now, anyway.

If we don’t formalize access to the settings, then every place that needs them might be doing things differently! Imagine the chaos if sometimes an option is retrieved with Application.VLookup, elsewhere with a loop over SettingsTable.Rows, and then another one could be getting the value with an offset from the result of SettingsTable.DataBodyRange.Find, and there’s another couple of different but not always equivalently dangerous or misguided ways to go about retrieving a value from that table, and there’s no need to have all of them around.

We could write a function that accepts the name of a setting and returns a Variant holding the associated value if it exists, but what if we need to get the value multiple times, we’re going to read it off the worksheet every time?

Writing to a Range is perhaps the most expensive thing a macro can do, but reading anything from one comes close. In fact, a macro that performs well is usually a macro that limits its interactions with worksheets and the …entire Excel object model.

What we want is a method that iterates the table rows once, yielding an OptionValue class instance for each setting value. So we add this new class module and define an OptionValue class with a Name (String) and a Value (Variant) property.

But then if we call that method every time we want to get a setting value, things are going to be much worse than if we just used Application.VLookup every time, so what gives?

Grabbing every row of that table and turning them into as many OptionValue instances is an action, it wants to be a verb. Things we do with any kind of state are very often well described with a method’s name that starts with a verb, and that’s great already (especially if the rest of the name is actually somewhat descriptive), but a free-floating verb is up for grabs by anyone.

So we’re going to encapsulate it by making that method a member of a class for which SettingsService might be a good name: it’s a service that has the means to abstract away the worksheet and only expose OptionValue objects, and with it the rest of the code no longer needs to deal with the nitty-gritty details of how these objects come into existence, or how long they’ve been around.

Because we can write this class in such a way that we read the settings once from the worksheet (say, in the Initialize handler of the class itself), cache them in a private keyed collection (or private dictionary), and as long as our instance is alive we can return the cached option values whenever someone asks for them, and then they’ll be getting them without needing to hit the worksheet.

By adding an indexed property, we can even have a default member that makes sense, and the rest of the code can read its configuration like this:

Dim Settings As SettingsService
Set Settings = New SettingsService

If Settings("SomeSetting") Then
  Debug.Print "SomeSetting is ON"
Else
  Debug.Print "SomeSetting is OFF"
End If

Settings(“SomeSetting”) stands in for so many things here, all of which would distract from what this macro is supposed to be doing, however working with Variant like this is annoying, and the use of default members is abstracting away mechanics that we’d usually rather be explicit about, so we should instead expose typed methods, so we know (we/us, but also the compiler and Rubberduck here) what actual types and members we’re dealing with:

Dim Settings As SettingsService
Set Settings = New SettingsService

If Settings.GetBoolean("SomeSetting") Then
  Debug.Print "SomeSetting is ON"
Else
  Debug.Print "SomeSetting is OFF"
End If

Where GetBoolean being a method/function rather than a property should make us feel much better about throwing errors: if the setting doesn’t exist, we blow up. If the setting exists but its Variant subtype isn’t Boolean then we probably want to blow up rather than return gibberish. If it exists and it’s of the expected data type, we return the setting value, converted to an actual Boolean.

That means SettingsService also needs GetDate, GetDouble (and maybe GetInteger), and of course GetString, leaving the Variant values completely encapsulated in the service: callers don’t need to care about any of that, and that’s neat.

There is no Worksheet

The only thing that needs to do anything with SettingsSheet is the SettingsService. Nothing else needs to access it for any reason whatsoever, because we have a service that fully abstracts it away, so it might as well not be there.

And the macro should still work, assuming it knows how to deal with and recover from a missing setting value.

Settings could be moved to a flat file, another workbook, or a database, and only one method would need to change: the one that’s reading and caching the settings from the worksheet, would instead be connecting and querying a database – and would still only need to hit it once.

And none of anything else would need to change, because it’s all completely yielding that responsibility to this service.

Compare to what it would be like to change inline VLookups and Range.Find calls (wherever they are) to read from another source, and you can quickly see the benefits of having sane abstraction levels.

Code that desperately wants to control how everything is done at the lowest level of detail, is tedious and heavy. It’s hard to tell what the role of such a procedure is, because too many things are going on and the signal gets drowned in noise.

There doesn’t need to be a service class, or even an OptionValue class: any distinct procedure scope that’s clearly responsible for retrieving a valid setting value is a good step forward. But moving the state into an object makes it easier to control its lifetime, and by encapsulating behavior we clean up the calling modules the same way extracting procedure out of a larger scope cleans up that larger scope. It reduces the cognitive load and complexity by moving away code that’s concerned with the peripherals of any given macro’s purpose, and thus increases the cohesion of the macro’s module because things that aren’t directly related to what the macro is specifically responsible for, are simply elsewhere, where they belong.

When to MsgBox?

August 18, 2024 Rubberduck VBA3 Comments

Perhaps the most popular function in the VBA standard library, MsgBox is likely how your first VBA “hello world” displayed its triumphant, hopeful message to a would-be user. A custom message dialog built with a UserForm would have the same purpose: the program needs to convey or receive information from the user, …or does it?

TL;DR

As a programmer it’s often useful to follow/trace execution in details, and it’s easy to think the user is also interested in this information… but when a program is overtly verbose, users tend to just OK every prompt, without paying much attention to what’s going on. Keep messages short, and use them sparingly. Treat every single one of them as interrupting the user and interfering with whatever it is that they were doing; is it that important to show this message now? Or at all?

MsgBox

As mentioned earlier, MsgBox is a function that you will find in the VBA.Interaction module. Being a function, it returns a value: a VbMsgBoxResult enum (integer) that represents the button that the user used to dismiss the dialog. The title is the name of the host application unless parameterized otherwise, and the icon and buttons can be configured in preset combinations, making it a quick and easy way to prompt the user or ask for confirmation, or to let them know something went wrong when the program crashes gracefully.

All modules in the standard library are global, so you don’t need to qualify calls to their members; that’s why and how MsgBox, VBA.MsgBox, and VBA.Interaction.MsgBox are all equivalent. Because they’re scoped functions and not language-level keywords, their calls are resolved using the standard scoping rules, so they can be shadowed by a same-name identifier in the current project. If a project defines a MsgBox function, then unqualified MsgBox calls will resolve to the project-defined function, because anything defined in the current project has a higher resolution priority than any referenced library – even if that’s the VBA standard library.

Knowing that it’s a function doesn’t mean we’ll necessarily want to systematically capture its return value – but it should eclipse any ambiguity around whether parentheses are needed or not: it’s exactly the same rules as any other function call! To be clear, that means the parentheses are needed when you are capturing the result, and they’re not when you’re not. The catch is that when they’re not needed, they’re also actively harmful when they’re present.

The deal with the parentheses

The language syntax uses parentheses to delimit argument lists. Parentheses are however also perfectly usable in any expression, including argument expressions. When you invoke a procedure, you need to know whether you’re interested in capturing a return value or not. If you’re calling a Sub procedure, then there’s no return value; Function and Property Get procedures normally have one.

If you’re NOT capturing a return value, you DO NOT use parentheses.
If you are capturing a return value, you need parentheses… unless it’s a parameterless call.
Using a function call as an argument expression or a conditional expression means you are capturing the result.

❌ result = SomeFunction arg1, arg2

✅ result = SomeFunction(arg1, arg2)

The VBE will suppress () superfluous empty parentheses, but if you’ve surrounded an argument list with parentheses that shouldn’t be there, then the VBE has added (and will stubbornly maintain) a space before the opening bracket that’s your clue that the VBE has parsed your argument list as an argument expression, and the only reason it worked is because your argument list is made up of a single argument. Indeed, since the comma isn’t an operator in any possibly recognizable expression, it would be a syntax error to try to pass a second argument without first closing the parentheses around the first one – but then while legal, it looks pretty silly to just wrap each argument with parentheses for no reason:

❌ DoSomething (arg1, arg2)

🤡 DoSomething (arg1), (arg2)

✅ DoSomething arg1, arg2

✅ Call DoSomething(arg1, arg2)

Sticking a Call keyword in front of a procedure call makes the parentheses required, because then when you think about it, it’s the procedure call itself that becomes the implicit argument list, as if we were invoking some Call procedure and giving it DoSomething as an argument. Not how it happens, but definitely how it reads if we ignore the keyword-blue syntax highlighting.

Communication Tactics

Whenever you are about to use MsgBox, ask yourself if there wouldn’t be another, less disruptive way to inform the user of what’s going on: sometimes a status message in Application.StatusBar gets the message across without requiring interaction from the user.

Because we tend to think of MsgBox as just popping a message, but every message we show has to be [ideally read and] dismissed by the user, …and that’s OK sometimes, but it’s better to avoid it if we can.

Some points to keep in mind:

They’re not going to read it.
If it’s more than one sentence, it’s probably too long.
If something went wrong, tell the user what they can do about it; don’t just show some cryptic error message with a scary icon.
They’re not going to read it.
If something completed successfully and the application is already in a visual state that makes it apparent, then it’s not needed.
They’re not going to read it.

Instead of using a MsgBox, consider:

Listing or logging operations in a dedicated worksheet/table as they occur, so detailed information is available if it’s needed, but without interrupting the user for it.
Implementing a UserForm or using a worksheet to capture execution parameters beforehand (define a model!), instead of repetitively prompting for each piece of information that’s needed.
If a macro must make a user wait, change the cursor accordingly; resetting the cursor back to normal when the macro is done, already signals completion.

If you must show a MsgBox...

They’re not going to read it.
Avoid the “?” vbQuestion icon; it was deemed a confusing icon a long time ago and is very uncommon to see in a serious application.
Write prompts as clear, concise, yes/no questions; prefer vbYesNo buttons, because OK/Cancel buttons necessarily require a more careful reading of the prompt.
They’re not going to read it.
Ask yourself if it’s really that important.

An otherwise excellent macro could always easily be turned into the most annoying thing with just a couple of misplaced MsgBox calls: communicating to the user is an important thing to do, but just because it’s something MsgBox does, doesn’t necessarily have to mean it’s something MsgBox must do.

Understanding Libraries

August 14, 2024August 14, 2024 Rubberduck VBA3 Comments

Once upon a time I was quite active on Stack Overflow, and then there was a discussion on the meta site about the various “flavors” of VBA and I won’t get into the details here, but I remember it irked me that a bunch of very meticulous, very technical and often terminally pedantic folks would be conflating the language itself with… just some of its common libraries.

VBA – the programming language – is just VBA. One could reasonably argue that the standard library can be considered a part of it, but that’s where I draw the line: there’s no “Excel VBA” or “Word VBA” or “Access VBA”… calling different member calls into different objects from different libraries a different language altogether makes no sense to me. It’s the same grammar and syntax, it’s the same runtime, and host-agnostic code will run exactly the same in every conceivable host application (and there are many more than just the handful of Microsoft Office ones!): a programming language is not defined by the libraries it binds at compile time (nor at run time for that matter), but its type system, its syntax, and its semantics define it unambiguously – and all of these are all identical in all conceivable “flavors” of VBA… because that is what makes it VBA: its language specifications.

What’s different in different hosts, is the objects you’re working with and even then, that’s limited to the default references (arguably should be an… IDE setting?) while assuming you’re not automating Excel from an Access project, for example: the objects you’re working with depend on what your code needs to do, and so everything is just plain VBA no matter how you might want your favorite host application to be special. No flavors, it’s all plain vanilla.

Library rhymes with Vocabulary

Libraries are a special type of executable file that define types and their members, and then other executables can dynamically link these types and invoke these methods, so we call them dynamic link libraries, or DLLs. They are abstractions for various APIs, and they quite literally extend the vocabulary of a program with nouns (types/classes) and verbs (methods) that do color the expression of the language, but yeah it’s just colorful vanilla. I mean your own classes and methods’ names do exactly that, too – right?

COM libraries act as servers and they use and expose a specific set of interfaces that all COM clients (COM is all about client/server) understand, so VBA can reference and use such libraries, because VBA is built on top of this technology. COM is the Component Object Model, which is a development toolkit for Microsoft Windows that was very popular in the 1990s, until the .NET Framework changed the landscape in the early 00s. The model was language-independent, but not platform-independent like today’s modern .net is, so it’s particularly impressive to see VBA code running on a Mac, but I digress. The cool part is that whenever a VBA project compiles, a COM library gets generated in-memory within the host process, and when we run VBA code, what’s actually running is the compiled library – the VBA source code does not get interpreted in the editor as such.

In fact the VBA source code in the editor is basically decompiled from the p-code tokens generated by compiling the original source code and that is why the VBE always seems to “autocorrect” when you validate a line 🤯

Browsing Libraries

The language implements the COM type system but hides pointers and invocation mechanisms (IDispatch and IUnknown interfaces), and (somewhat hidden) attributes control type and member metadata, including documentation. VBA projects can leverage all of these capabilities with or without Rubberduck, but Rubberduck does make it much easier to manipulate the hidden VB_Attribute statements.

The VBE’s Object Browser reveals everything, even hidden members (fun fact, VBA doesn’t honor the “hidden” flag for user code like it does for referenced libraries, but VB6 did). Right-click anywhere and make sure “Show hidden members” is checked, and then they’ll show up here and in name completion lists. In the search/filter section there’s a dropdown listing all loaded libraries: notice your own compiled VBA project is listed there.

Global (namespace) pollution

Everything that’s public in a library, becomes part of the global namespace once it’s loaded into a runtime environment – so what happens when two libraries define classes with identical names? Sure a good practice is to qualify library types (e.g. Excel.Application, Word.Range, etc.) so that the code is clear and works as intended, but how does it still work when we don’t? Or does it?

Turns out, the up/down arrows that let you move libraries up and down the list in the add/remove references dialog, is the answer: referenced libraries are prioritized in order, which is why the VBA standard library and the host application’s object model library are stuck at the top (in that order) while everything else is allowed to move up and down.

When VBA binds a Range declaration, if the host is Microsoft Word then an unqualified Range is always a Word.Range, but if the host is Microsoft Excel then an identical declaration would bind to Excel.Range, even if the Word and Excel libraries are referenced in both cases, because the library provided by the host application always has a higher priority than a library reference that was added afterwards. Things get more hairy when you’re in an Access project that references DAO and you want to use ADO and suddenly you’ve rewritten [DAO.]Recordset into [ADO.]Recordset just by accidentally flipping the libraries’ respective priorities.

Know Your Tools

Whenever you face a new library – be it a new host, Scripting, or regular expressions, using the Object Browser to get acquainted with the types and members can often be all the direction we need to know exactly what to look for in the documentation (or Stack Overflow), and it should be exactly what a certain “AI” chat bot pulls for you (ideally with some reference links) if you ask it the same.

The VBA standard library provides a small set of basic functions and constants that are necessarily available to any VBA project, so a host-agnostic module can use them freely. This library is the closest thing to a framework VBA gets out of the box: the functions are mostly wrappers around Win32 API calls, making it easy to display message boxes, manipulate strings and dates, and to perform a number of basic math and trigonometry operations.

As a VBA developer, it is absolutely essential to understand what the VBA standard library does (and what it does not), because it’s easy to reinvent the wheel if you don’t know it’s already there. It’s a very small library compared to the .NET Framework, and as such it’s very much bare-bones – which means it’s good material for a backbone to build upon. Take the time it takes to play with each of the string and date functions, take an eyeball to the math functions so you know what’s there when you need it, and then explore the rest of the library and lookup the documentation for anything you don’t understand: the VBA standard library is one of your primary tools, it’s a good idea to know how to wield it.

The Excel library is much larger, but manageable if you focus on Application, Workbook, Worksheet, Range, and Name classes and their members, and learn the rest from the documentation and tutorials/examples as needed. But then there’s something sneaky the library does, that leverages these COM features we discussed earlier. Some hidden classes are taking members that would belong to the Application, members that belong to the active Workbook and Worksheet, and punts them into global scope “for your convenience”.

And that’s where everything went wrong in my opinion: these “convenient” members are a trap, and they obscure what’s going on so much that it’s often impossible to know exactly what VBA code does without additional context, such as whether the presented code is written in a worksheet module or elsewhere (perhaps in a standard module). It is my opinion and that of many other professionals, that good code does what it says, and says what it does. And this isn’t it.

It’s in the docs!

Not going to happen in a million years, but Microsoft should break badly written Excel macros by deprecating and removing the magic ActiveSheet members from the hidden _Global interface. Ok maybe not. But ugh.

VBA is a simple language on the surface, but it’s easy to confuse some keywords with standard library functions (quick: LBound/UBound, keywords or functions?), and even some standard library functions with some in the Excel library (InputBox springs to mind). Add implicit default members to the mix and you get code that can really throw you a wrench and make you wonder what you’ve been drinking. “The same code worked perfectly just yesterday”, and “it works fine normally, but errors when I step through with the debugger” are all too common, and always directly caused by all the ever-so-convenient beginner traps.

One part of the problem is that many self-taught and learning developers do not necessarily know not to read too much into examples in technical documentation: if you’re reading about the Worksheet.Range method, the examples aren’t there to show you how to, say, properly declare variables, or pass inputs as parameters – and so they don’t… but that doesn’t mean your code should look exactly like the documentation examples! Macro recorder code is just the same: just because it works off implicit ActiveSheet references does not mean good VBA code should be doing that. The macro recorder can quickly teach you what types and members are involved in making a table out of a range of cells, but it’s not meant to show you how to do that beyond the context of the active sheet and the current selection, which are two things you’re actually going to want to be very explicit about in your own code. This goes on for virtually every topic under the Sun: unless you’re specifically reading something about best practices, don’t try to read naming conventions or particular ways to do things from small contrived examples that mean to depict and highlight a single other entirely different thing. Sometimes I wonder how different the world would have been if all examples were written as standalone functions that explicitly take all their inputs as parameters…

Both the VBA standard library and the Excel object model are extensively documented, but the Excel library and a lot of its examples are often making it hard to find the right place to look at when there’s a problem, because of all the implicit shenanigans: it’s not that VBA itself is making all the wrong design decisions here – but the Excel library is leveraging COM/VBA features in such a way that it remains trapped in a very 1990’s notion of what “convenient” means, for the better or worse.

🫶✌️