700+ Issues?

The project’s GitHub repository currently has over 700 open “issues”… but 500 of those are all the wonderful things you/me/we would like Rubberduck to be able to do: we carefully label each one of them as they’re opened, which makes these 700-some “issues” more like “tickets” or “work items”, sometimes even “brainstorming records”. We love feedback and ideas, and we’re usually quite fast to respond to new issues, too!

That said yes, there is a metric ton of things that need tweaking. Here’s the more important stuff that’s going to be relevant throughout the v2.x cycle:

Problem: Access violation (crash) on shutdown

Tearing down a managed (.net) VBE add-in is hard. There are so many important things we easily take for granted when writing .net code, that fly out the window when the .net runtime itself is spawned from your DLL being loaded by an unmanaged COM process. We found the hard way that the Runtime Callable Wrappers .net is helpfully creating, appear to assume the COM objects are owned by .net: if a destructor didn’t get to run, or if garbage collection (which is undeterministic) didn’t complete before the VBE decides it’s over, it’s over: either there’s an access violation, or the host application closes, but the host process remains running, stuck, still consuming all the leaked memory.

Solution: Rubberduck solved a good 95% of the teardown issues since the release of v2.0, thanks to the impressive efforts of the project’s awesome contributors. COM objects and RCWs are now almost under complete control, window subclassing is centralized; we’re all careful to dispose the COM wrappers we create, and we review each others’ pull requests so as to increase the chances of picking up any oversight before the changes are merged.

What’s left is on the managed (.net) side of things: we still need to clear up the Code Explorer‘s ViewModel, it’s holding on to the parser state, which ultimately prevents its proper garbage collection. I haven’t had any problems exiting Excel in months at this point, but Access doesn’t seem to be as smooth; we need to sort out these object references and dependencies to get a 100% clean exit in all hosts. It’s in sight.

Problem: Memory (RAM) consumption

Rubberduck parses an average-size module in under a second, and then traverses its parse tree and proceeds to resolve all identifier references in sub-second time as well. This amazing performance has a price: we cache every single publicly accessible declaration in every single referenced type library, and then expose them to our resolver through various dictionaries, each using different types of keys – an empty Excel VBA project might have well over 30K declarations in memory, cached a dozen times each so that the resolver and static code analysis can access any one of them pretty much instantly. As a consequence, an average Excel VBA project might consume ~400MB RAM, as opposed to ~30MB without Rubberduck.

Solution: since the beginning of the project, the primary concern had been to get something that works, then to get something that works fast… memory footprint has constantly been left behind – any program is a trade-off between performance and memory. We “just” need to find a way to give memory footprint more priority. The most realistically implementable solution we thought of, that wouldn’t impact performance much, is to “simply” store the cache in an embedded SQLite database, and use actual SQL queries to lookup declarations. This demands a rather major overhaul of every feature that needs this data though.

Problem: Inspection false positives

The parser now handles literally every legal bit of VBA (and VB6) code we can think of throwing at it. The resolver, which is responsible for resolving the identifiers we find in the parse trees to a declaration that’s in scope in the context of the code being processed, is more complicated. See parsing is a mere grammatical analysis; resolving pretty much literally involves implementing the language specifications, and this means knowing everything there is to know about the internals of the type libraries being referenced – and then implementing the clever hacks VBA does with some of them, too.

If the resolver (and its dependencies, e.g. the part responsible for scanning the referenced type libraries) isn’t 100% perfect, then the code that requires this insight (refactorings, inspections, etc.) can’t work as intended either. Barring actual bugs in the inspections themselves, resolver issues are responsible for most false positives.

Sometimes it’s a matter of visibility, too. Rubberduck is mostly host-agnostic, meaning it doesn’t care whether it’s running in Excel, Access, PowerPoint, Outlook, AutoCAD, CorelDRAW, Sage 300, or any other one of the 200+ possible VBA host applications; we’re not scanning the “document modules” (e.g. worksheets) through the host object model. This means if you have a button on a worksheet that invokes DoSomething, and no other VBA code is calling that procedure, then Rubberduck can’t know that the procedure is actually in use.

Solution: we could have some host-specific logic to scan the host document modules and associate the controls with public procedures / functions, say, in a Microsoft Access, Excel, Word, or PowerPoint host.

I like how meta it is that we need to keep polishing our own code to better help our users polish theirs!

Until Rubberduck is perfect, false positives are going to happen. All static code analysis tools have varying degrees of understanding of the code and language; the AAA-grade ones have the best grasp at what’s going on, but it’s not the VBA compiler – the only certainty is that something will be off.

That’s why extending Visual Studio with a Roslyn analyzer that flags potential code issues is so nice: you are working with the compiler. Rubberduck has to do it the old way, with the VBIDE Extensibility API type library for a starting point, and the text content of the modules to understand the code in that string: we’ve come quite a long way already!

For these situations we have a special ‘@Ignore annotation that the “ignore once” quick-fix can add to the code, so that it will be skipped when inspection results are refreshed.