There is no worksheet.

Your VBA project is embedded in an Excel workbook. It references the VBA standard library; it references the library that exposes the host application’s (i.e. in this case, Excel’s) object model; it includes global-scope objects of types that are declared in these libraries – like Sheet1 (an Excel.Worksheet instance) and ThisWorkbook (an Excel.Workbook instance). These free, global-scope objects are right here to take and run with.

You’re free to use them, wisely.

True, they’re global – they can be accessed from anywhere in the code.
They can… and that doesn’t mean they should.

And if you’re willing to do whatever it takes to abstract away the host application’s object model in your “business logic”, then you can isolate your logic from the worksheet boilerplate and write pretty much the same code you’d be writing in, say, VB.NET… or any other object-oriented language for that matter.


 

Abstracting Worksheets

There is no worksheet. There is data. Data going in, data going out: data is all it is. When the data is coming from a database, many programmers immediately say “I know! Let’s implement a repository pattern!“, or otherwise come up with various ways to abstract away the data-handling boilerplate. If you think of worksheets as data, then it’s not any different, really.

So we shall treat worksheets as such: data. What do we need to do with this data?

Some tried to make worksheets Implements interfaces, and ran into issues (here too, and oh another). I completely agree with this post, which basically boils down to don’t.

Whatever you do, don’t make worksheets implement an interface.

Wrap them instead. Make a proxy class implement the interface you need, and then make sure everything that needs anything on a worksheet, accesses it through an interface, say IWorkbookData.FooSheet, where FooSheet is a property that returns a FooSheetProxy instance, exposed as an IFooSheet.

Diagram

The only thing that ever needs to use a FooSheet directly, is a FooSheetProxy.

I don’t know about you, but I don’t like diagrams. So how about some real-world code instead?

Say you have an order form workbook, that you distribute to your sales reps and that come back filled with customer order details. Now imagine you need a macro that reads the form contents and imports it into your ERP system.

You could write a macro that reads the customer account number in cell O8, the order date in cell J6, the delivery and cancel dates in cells J8 and J10, and loops through rows 33 to 73 to pull the model codes from column F, and the per-size quantities in columns V through AQ… and it would work.

…Until it doesn’t anymore, because an order came back without a customer account number because it’s a new customer and the data validation wouldn’t let them enter an account that didn’t exist at the time you issued the order form. Or you had to shift all the sized units columns to the right because someone asked if it was possible to enter arbitrary notes at line item level. Or a new category needed to be added and now you have two size scales atop your sized units columns, and you can’t just grab the size codes from row 31 anymore. In other words, it works, until someone else uses (or sees) it and the requirements change.

Sounds familiar?

If you’ve written a script-like god-procedure that starts at the top and finishes with a MsgBox "Completed!" call at the bottom, (because that’s all VBA is good for, right?), then you’re going to modify your code, increase the cyclomatic complexity with new cases and conditions, and rinse and repeat. Been there, done that.

Not anymore.

Name things.

Abstraction is key. Your code doesn’t really care about what’s in cell O8. Your code needs to know about a customer account number. So you name that range Header_AccountNumber, proceed to name all the things, and before you know it you’re looking at Header_OrderDate, Header_DeliveryDate and Header_CancelDate, and then Details_DetailArea and Details_SizedUnits named ranges, you’ve ajusted your code to use them instead of hard-coding cell references, and that’s already a tremendous improvement: now the code isn’t going to break every time something needs to move around.

But you’re still looking at a god-like procedure that does everything, and the only way to test it is to run it: the more complex things are, the less possible it is to cover everything and guarantee that the code behaves as intended in every single corner case. So you download Rubberduck and think “I’m going to write a bunch of unit tests”, and there you are, writing “unit tests” that interact with the real worksheet, firing worksheet events at every change, calculating totals and whatnot: they’re automated tests, but they’re not unit tests. You simply can’t write unit tests against a god-like macro procedure that knows everything and does everything. Abstraction level needs to go further up.

The [Order Form] worksheet has a code-behind module. Here’s mine:

'@Folder("OrderForm.OrderInfo")
Option Explicit

Yup. That’s absolutely all of it. Your princess is in another castle. A “proxy” class implements an interface that the rest of the code uses. The interface exposes everything we’re going to need – something like this:

'@Folder("OrderForm.OrderInfo")
Option Explicit

Public Property Get AccountNumber() As String
End Property

Public Property Get OrderNumber() As String
End Property

'...

Public Function CreateOrderHeaderEntity() As ISageEntity
End Function

Public Function CreateOrderDetailEntities() As VBA.Collection
End Function

Public Sub Lockdown(Optional ByVal locked As Boolean = True)
End Sub

Then there’s a proxy class that implements this interface; the AccountNumber property implementation might look like this:

Private Property Get IOrderSheet_AccountNumber() As String
    Dim value As Variant
    value = OrderSheet.Range("Header_Account").value
    If Not IsError(value) Then IOrderSheet_AccountNumber = value
End Property

And then the CreateOrderHeaderEntity creates and returns an object that your “import into ERP system” macro will consume, using the named ranges defined on OrderSheet. Now instead of depending directly on OrderSheet, your macro depends on this OrderSheetProxy class, and you can even refactor the macro into its own class and make it work against an IOrderSheet instead.

What gives? Well, now that you have code that works off an IOrderSheet interface, you can write some OrderSheetTestProxy implementation that doesn’t even know or care about the actual OrderSheet worksheet, and just like that, you can write unit tests that don’t use any worksheet at all, and still be able to automatically test the entire set of functionliaties!


Of course this isn’t the full picture, but it gives an idea. A recent order form project of mine currently contains 86 class modules, 3 standard modules, 11 user forms, and 25 worksheets (total worksheet code-behind procedures: 0) – not counting anything test-related – and using this pattern (combined with MVP), the code is extremely clear and surprisingly simple; most macros look more or less like this:

Public Sub AddCustomerAccount()
    Dim proxy As IWorkbookData
    Set proxy = New WorkbookProxy
    If Not proxy.SettingsSheet.EnableAddCustomerAccounts Then
        MsgBox MSG_FeatureDisabled, vbExclamation
        Exit Sub
    End If

    With New AccountsPresenter
        .Present proxy
    End With
End Sub

Without abstraction, this project would be a huge unscalable copy-pasta mess, impossible to extend or maintain, let alone debug.

See, there is no worksheet!

25 thoughts on “There is no worksheet.”

  1. Hello, thanks for continually posting new material, and I especially appreciate the object oriented material! I continually check your website for new material, but have never posted anything since I am still new to OOP and slowly getting the hang of it. I’ve been working with VBA full time for the past 4 years, and want to shift into using OOP as opposed to creating “script-like god-procedures”. I’ve tried to create a very generic version of what you’ve explained here, but I’m having a lot of trouble.. Is there anyway you could provide an example workbook with what you’ve explained here?

    Like

  2. I second the request for a simple sample of this theory. I have tried implementing myself but have not been able to do so. I get a bit confused as it seems the sample code is a mixture of abstracted examples (foo, iWorkbookData) and examples from a specific project (ISageEntity).

    Thank you for all of your posts on this. This Userform1.Show post was extremely enlightening!

    Like

    1. I’ll try to squeeze some time to whip up a sample project in the next couple of days, next week the latest… Possibly with a follow-up post digging into some of the implementation details.

      Like

      1. Hi Mathieu! I am writing myself some “instructional notes” on the WorkbookProxyExampleDialog file and the concepts therein. In the ExampleDialog, WorkbookProxy, ExampleModel, and SomeModel classes you use a Private Type and a “this As ” the Type (TModel, TView, TProxy, etc), but in ExamplePresenter you did not use that convention. Instead, you use a module level variable “wbProxy”. Is there a reason for the different approach or simply another way of doing things, but could have been done using the Private Type approach?

        Like

      2. Hey Smiley, thanks for noticing! It’s probably just an oversight, since that was really just a throw-away example to illustrate things.. OTOH, the private type approach isn’t compatible with WithEvents variables, so I’d say it’s best used and most convenient in “model” classes that encapsulate data and provide slim functionality. A presenter class like that example doesn’t really hold any state, and could reasonably need to declare WithEvents variables – not saying that’s what’s justifying the apparent inconsistency here (I honestly don’t remember!), only that upon reflection it doesn’t feel entirely wrong either. Cheers! …have you seen the v2.4.0 release?

        Like

      3. I just downloaded 2.4 and will be installing in a few minute! Looks to be awesomer than 2.3…
        I know the example was a “throw away” (kinda sorta), but for me it has been a great learning tool. Since I am a “picture-oriented” person I have made a diagram that is helping me see the bigger picture while providing insights into salient elements (hence my noticing the slightly (but not really) inconsistent (non) use of Type. I do like the idea that as a good practice using the private Type is best used in data (Model) classes.
        Thanks for sharing your thoughts!

        Liked by 1 person

      4. great site, great examples, and great comments, all with a wealth of information. Certainly, this all represents a great start to OOP. I can download and/or replicate your examples; however, what would be helpful is ‘comments’ included in the files, explaining connections, reasons, etc. Of course, I can step through the code, but I have a lot of questions along the way. For example, why set objects one one class vs another class or standard module? could be ‘doesn’t matter’, or to provide better flexibility/scalability, or unit testing, doesn’t work else where, or depends. Anyway, thanks a lot!

        Like

  3. I also want to support the request of Neil and Diederik to provide sample (Excel) Files, not only for this article, but also for “How to unit test VBA code?” and “UserForm1.Show”.
    Please keep enlightening us with your really great articles! Many thanks in advance.

    Like

  4. Hi Mathieu. I have an idea related to this article, and would like to know if this is something you have considered yourself (and decided not to do), or if you see any drawbacks to my method compared to using the Name Manager capabilities.

    I have one class (ApplicationNames) dedicated to hold the useful ranges found in my application (or several smaller classes when the application is big: InputSheetNames, OutputSheetNames, etc.), so for instance I can have properties like:
    – HeaderRowNumber: which returns the row number of my header for a small application or for one particular data sheet in a big application.
    – IDColumnNumber: which returns the number of my ID column.
    – IDColumn: which returns the whole column content.
    – ExecutionMode: that provides direct access to the cell holding this particular parameter.
    – ItemDataRange(itemRow): this one allows me to reference a whole row (itemRow) in a given data range of interest.

    These properties allow me to access relevant worksheet information without having to rely on the Name Manager and the naming of ranges for that particular purpose. This way I can keep all my code and business logic in just one place, the VBE. This is the main benefit for me, because it’s easier to understand everything related to my application by just looking at the code!

    If I need a reference of some kind to my worksheets, I know it has to be produced by a class of this type. Any changes to the relevant elements of the application would be reflected in the properties of this class. Granted, changes like adding a column will probably force me to adjust some of these properties, but my unit tests won’t let changes go unnoticed too long. And I can minimize the impact of such things by defining things like this: ˋSomeColumnNumber = PreviousColumnNumber + 1ˋ, so that introducing a new column will only require to adjuts the previous and next columns.

    What do you think?

    Like

    1. That’s similar to what I do. For example I can have an ‘OrderHeaderDate’ named range, and then an ‘OrderHeaderDate’ property that gets me the date value without needing to wonder which cell that might be in, or even which named range might be holding it. For columns in a table I like to hard-code header names and determine their index programmatically, so adding columns or reordering them doesn’t break anything.
      If the logic needs to be testable without hitting an actual worksheet though, that won’t be enough; that’s where the proxy types come in. I’ll have an interface with e.g. my ‘OrderDate’ property, and the implementation will pull it from the worksheet property I mentioned above (which is already a level of abstraction higher than ‘sheet1.Range(“OrderHeaderDate”)’), but then this level of abstraction allows for a stub class that can implement the same interface and give me the same date (or a configurable one) that my tests can work with, without hitting the actual worksheet. I also have an ‘IDataSheet’ interface that I use to abstract away a table on a sheet, while maintaining the ability to read and write to it; e.g. for a ‘Categories’ table I’ll have a ‘CategoryModel’ class representing a record, and a method on ‘IDataSheet’ returns a collection that the sheet-proxy implementation populates by creating a new ‘CategoryModel’ instance for each row. Then the ‘SalesReps’ table on another sheet is abstracted using the same interface, but the wrapper/proxy for that one populates the collection with ‘SalesRepModel’ instances. Then the rest of the code works with ‘CategoryModel’ and ‘SalesRepModel’ classes, which it gets from ‘IDataSheet’ implementations that ultimately read from the tables on the worksheets, but then I can always give it a stub that returns test models that came from the test setup code instead of the worksheets, and if I had a database to get this data from, then I’d just make a new implementation that gets me the models from a database instead, and none of the rest of the code would need to change. Hence, “there is no worksheet” =)
      Tests hitting worksheets aren’t fully self-contained, and can break if the sheets start breaking assumptions – which is a good thing to test in some circumstances indeed, but in this article I meant to illustrate how the code could be written so as to completely remove worksheet-specific concerns from the code, through abstraction and indirection.

      Like

      1. Yep, this seems close enough. I never thought about the IDataSheet interface though, but now I clearly want to implement this idea, as I can see the benefits. I already have SalesRepModel and CategoryModel classes (mine are BufferModel, HistoryRecordModel, etc.), and having the interface will allow me to define base functionality for this type of classes, use stubs for testing, etc.

        I have one remaining question though: is there a good reason why I would want to switch back to using named ranges (and the Names Manager) instead of just cell references hard coded directly in my IDataSheet instances? I see the clear drawback: as said in my previous comment, they make me hold my business logic in two places, which I is less clear and convenient. I just can’t seem to see a big benefit to overcompensate for this.

        By the way, thanks for being so generous with us, the VBA community. Keep up the great work with Rubberduck, the blog and the Stack Exchange answers!

        Like

      2. I think named ranges are still useful; they somewhat shield the code from changes when the sheet structure needs to change or cells need to move around. I like to avoid having specific range addresses in code, at least for ranges that can be named (and then referred to by name). OTOH I could be over-abstracting things lol

        Like

  5. Thanks Mathieu!

    I would really appreciate if you can dedicate a future blog post to discussing in more detail the IDataSheet concept. I think its very powerful and I would like to know a bit more about the details of how you go about implementing it!

    Liked by 1 person

  6. I also would like to do so, but unfortunately it seems that you (Mat) still didn’t find the time to provide an example workbook as already was requested in the comments in December …
    (Please see this reply as (another) reminder.)

    Like

  7. […] Notice these aren’t doing anything really – they merely work out a way to talk to another component – see, making a worksheet (or any document module class) implement an interface is a very bad idea: don’t do it (unless you like to crash the host and lose everything). So instead, we make another class implement the “view” interfaces, and make that class talk to the worksheet – a bit like we did in There is no worksheet. […]

    Like

  8. Mathieu, I plan on implementing this approach for a project of mine. This project is an Analysis form that shows employee metrics in several areas within their job function.

    The analysis displays 100+ metrics scattered throughout cells, in a 8 1/2 X 11 sheet. Each metric has its own category and sub categories. The source of the data is a table record-set that has employee IDs and lists all the metrics for each category and sub category.

    Question: “Should I create separate views for each metric category?” I don’t want to have many “Public Property Get” on one interface.

    Like

    1. Normalize the data: instead of having a set of properties per metric, and dozens of such groups on the interface, define what a metric is in its own class – that would be an ID/name, category/subcategory, and a value; now when you model the data your read from the sheet, you get 100 instances of a Metric class for each employee. Do the same for other groups of properties (not clear if applicable), and you end up with an EmployeeRecord class with a very reasonable number of properties.

      Liked by 1 person

      1. Thanks Mathieu! I’m going to work on this, ill post something on CodeReview. Hopefully, the post will make it your way for review 😊

        Like

      2. Mathieu, having trouble coming up with a pattern that implements your explanation.

        Would you mind elaborating your answer with a bit more detail please

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s