Jonathan Harris
Founder @ Total Portfolio Project
November 2022

@John Berger, great questions! I think it really depends on what you think the errors in play ‘look like’, and what the risks are if you make the wrong decision. I can see different ‘ratings math’ doing well in different contexts. The more I think about this the more curious I get about the specifics of how everyone else is thinking about it.

This discussion, where we’re sharing our current ‘answers’, seems like a good starting point. But it doesn’t feel like it will be enough to convince anyone one way or the other.

One idea that might work as a next step is to put together a commonly agreed pool of example (hypothetical) investments – with a wide mix of scores on the impact dimensions and other characteristics. And then apply different approaches to them. Also stress test the approaches by adding different errors to the dimensions. Different investments would rank differently under each approach and perform different under the stress tests.

The result would be that rather than just saying ‘I multiply’ or ‘I add’, we could say ‘I use a weighted-average approach because it makes sense to me AND it worked best for my asset class in the Impact Frontiers ratings stress test’.

Thoughts? Anyone want to see if we can do something like this? Or other ideas?

1
John Berger
COO @ Women of the World Endowment
November 2022

Jonathan

It might help if I share some of my biases.  This is long but it will help explain why I am asking these questions.

I’ll start with a concern that I have that stems from some of the trends I have seen in nonprofit evaluation.  In the nonprofit funding world, there has been a strong trend to the idea that evidence in numerical form is better than qualitative evidence and thus a strong push by funders to require nonprofits to report quantitative outcome data.  Sometimes that’s great and improves the work and evaluation, but more often than not what I see happening is that nonprofits end up spending a lot of time and money creating low-quality data that has far less useful info than the qualitative reporting and related due diligence that funders used to rely on.   Its been very discouraging to see some amazing nonprofits lose funding because they don’t play the data game well with funders.

Similarly, in my work in the impact metrics space and the emerging ESG and other sustainability reporting requirements, I have seen a trend (not accusing anyone here of thinking this way) which is basically faith in the idea that if we make people put numbers to things its always better than narrative.  This faith then extended to “all data is good data” yet the metrics systems (even the best ones like TCFD if you look closely ) are kind of a mess with too many options on how to report that results in data that really is not comparable between reporters.

I also have a bias from my banking life, where we have lots of financial data that has been well specified and is pretty easy to report comparably between companies, yet is still just the past and has a lot less power to tell the future than we would like. For example, If I wanted to understand a company’s CAPEX, I can see that past easily, but I’d much rather read a narrative about a company’s R&D plans than look at data.

Because of all those biases, I have been trying to learn more about the breaking point in the utility of qualitative vs quantitative approaches and how that question relates to information in impact, ESG, etc.

I don’t have an answer or even a good path yet.

In regards to your suggestion “next step is to put together a commonly agreed pool of example (hypothetical) investments…..then apply different approaches to them”  that’s a good idea and seems worth pursuing. 

 

 

Ellen Maginnis
Partner & Head of Impact @ Volery Capital Parnters
November 2022

Hi all – I am late to this discussion but have loved reading
through the responses.  It is incredibly helpful to see these different approaches to impact ratings and how they reflect in overall impact logic.  I personally love the 0 on any dimension = 0 impact approach. I haven’t successfully persuaded anyone I’ve worked with to adopt it, but feel emboldened by the fact that there are others taking this approach!

The biggest “impact logic” issue I grapple with when it comes to impact ratings has less to do with formulas and more to do with the
tendency to “cherry pick” the positive impact that is the focus of the
rating.  These concepts of weights, dispersion and even 0 impact only really matters when there is an understanding of how that
specific impact relates to others impacts (both positive and negative).

I know there’s a lot of great work being done to get at measures of net impact, and I think impact ratings are an important step in that
direction.  I also think that there’s more that we can do when applying impact ratings to more holistically capture overall impacts.

What if all investors used the formula Jackson proposed AND
produced a materiality map that showed all material negative and positive impacts, highlighting the specific impact of focus for the rating?

Perhaps it’s for a separate discussion board, but I would love to hear how others are representing material positive and negative impacts is relation to the impact of focus for the impact rating.

Jackson Gates
Research & Content Development Manager @ Impact Frontiers
October 2022

Many thanks to the early contributors to this discussion. It is exciting to see a diversity of approaches to impact ratings math in these first responses – investors pursue diverse impact strategies, and a primary aim of this discussion board is to help investors think about how to craft an impact ratings math approach that best reflects their particular impact goals and theories of change.

Two of the approaches mentioned in the piece that accompanies this discussion board have been cited thus far: the weighted average approach and the multiplication approach.

Neither of these, as currently formulated (pun oh so intended), manages to simultaneously capture three elements of impact logic that we’ve heard various investors espouse:

1.     Some investors ascribe differential weight to different dimensions of impact based on their impact goals and theories of change.

In the 5 Dimensions parlance, some investors’ goals may place a greater emphasis on HOW MUCH impact is associated with an investment rather than WHAT kind of impact its associated with. An investor whose goal is to help realize the SDGs, for example, may want to prioritize investments that advance an SDG for the greatest number of people over those that advance any one particular SDG. Others may want to prioritize those investments that impact underserved stakeholders over those that achieve maximum scale; an education investor, for instance, may choose an investment that improves math scores for a smaller number of students in low-income communities over an investment that improves math scores for a greater number of students in middle and upper income communities.

The weighted average approach allows for the assignment of weights.

A pure multiplication approach does not allow for the assignment of weights in the traditional manner (i.e. score*weight), though the group & multiply approach allows for the assignment of weights in this way within a group.

 

Weights

Weighted Average

Yes

Pure Multiplication

No

Group & Multiply

Partial

2.     Some investors believe some investments have exponentially greater impact than others.

It makes intuitive sense to many investors (see Jonathan Harris’ comment) that a company that reaches 100x more end-stakeholders than a peer working to deliver similar kinds of impact to similar kinds of stakeholders has ~100x the impact.

The weighted average approach often causes a ‘reversion to the mean’ among scores and limits their dispersion, and therefore does not allow for some investments to have exponentially higher scores than others (see Jonathan Harris and Meeta Misra’s comments). While many acknowledge that weighted averages produce an ordinal rather than cardinal ranking, others have pointed out that the weighted average method systematically undervalues the HOW MUCH dimension, and in so doing produces a distorted ordinal ranking of investment impact.

The multiplication approach allows for some investments to have exponentially greater scores than others, particularly when scores are made scale-sensitive (see Jonathan Harris’ comment).

 

Weights

Dispersion

Weighted Average

Yes

No

Pure Multiplication

No

Yes

Group & Multiply

Partial

Yes

3.     Some investors believe an investment that receives a score of 0 on any dimension of impact – WHAT, WHO, HOW MUCH, Enterprise Contribution, Investor Contribution, or Impact Risk – has 0 impact.

This idea is a bit more radical, and is an extension of the investor contribution that has gained traction in recent years: if an enterprise would have gotten the same capital, on the same terms (and the same non-financial support) in your fund’s absence (i.e. 0 Investor Contribution), your investment won’t have caused any impact that wouldn’t have occurred otherwise. As such, some investors believe an investment with 0 investor contribution should get a 0 on their impact rating – in other words, any investment that causes some non-zero positive impact that wouldn’t have occurred otherwise should be rated higher than any investment that causes no impact to occur that wouldn’t have occurred otherwise (even if the underlying enterprise is highly impactful).

Some have extended this line of reasoning, however, to the other dimensions of impact. Imagine an investment with the following impact profile:

 

WHAT: 0/5

WHO: 3/5

HOW MUCH: 5/5

Investor Contribution: 5/5

Impact Risk: 4/5

This investment might be causing significant effects that wouldn’t have occurred otherwise (Investor Contribution) for a large number (HOW MUCH) of moderately underserved stakeholders (WHO), but the WHAT score suggests that that effect is not ‘impactful’ – high scores on WHO, HOW MUCH, and Investor Contribution only entail greater impact if the WHAT itself is impactful. An investment in a video game or streaming service company, for example, might have this kind of impact profile, and some investors want their impact ratings to automatically push products like these to the bottom of their prioritization lists.

The same logic can be applied to the other dimensions. If an investment receives a 0 score on WHO, for example, one might argue the total impact is necessarily 0, as in the case of an investment that increases wages (WHAT) for huge numbers (HOW MUCH) of millionaires (WHO) that wouldn’t have occurred otherwise (Investor Contribution); a 0 score on HOW MUCH would imply no one is experiencing the intended impact, even if the target stakeholder group is underserved (WHO) and has articulated a need for the product or service in question (WHAT).

*in the case of impact risk, I have found the logic of a discount factor score (points received/possible points) intuitively resonant for many of the same reasons, though that subject is probably best left for another time.

Any approach that involves addition (weighted average and group & multiply) will not reflect this logic. Pure multiplication does, at the expense of the investor’s ability to apply different weights to different dimensions of impact.

 

Weights

Dispersion

0 on any dimension = 0 total impact

Weighted Average

Yes

No

No

Pure Multiplication

No

Yes

Yes

Group & Multiply

Partial

Yes

No

Having been troubled initially by this puzzle, I wanted to offer a proposed solution for investors who subscribe to all three of these elements of impact logic for consideration and discussion:

Rating = (WHAT^weight)*(WHO^weight)*(HOW MUCH^weight)*(InvestorContribution^weight)

This formula allows investors to assign different weights to different dimensions of impact; can generate impact scores that are orders of magnitude greater than others; and automatically assigns a 0 score to an investment that receives a 0 on any dimension of impact.

 

Weights

Dispersion

0 on any dimension = 0 total impact

Weighted Average

Yes

No

No

Pure Multiplication

No

Yes

Yes

Group & Multiply

Partial

Yes

No

Weighted Multiplication

Yes

Yes

Yes

Some investors don’t subscribe to all three of these logics – values alignment investors, for example, do not always hold an explicit aim of causing impact that wouldn’t have occurred otherwise, and as such may want their impact ratings math to allow for investor contribution to affect an investment’s rating, but not rule out those investments with 0 investor contribution.

For those that do subscribe to all three, however:

1.     Are there other formulas that reflect all three of these elements of impact logic?

2.     Are there other elements of impact logic that an investor may want their impact ratings math to reflect?


3.     If not, should all investors use this formula in their impact ratings?

John Berger
COO @ Women of the World Endowment
October 2022

Jonathan and anyone who multiplies.  How do you think about the multiplicative effect re the propagation of error?  This data all seems to be high error no matter how you define error (SD etc) so doesn’t multiplying make it likely you are losing the signal?

How do you think about the units?  Multiplying  20 apples X 20 oranges does not get you the number 400 as an answer.  At least weighting methods let you hack that problem.

Jonathan Harris
Founder @ Total Portfolio Project
September 2022

We multiply because we want our impact ratings to be in line with the potential scale of our impact as investors.  Our general theory of change, if put into an equation, is that:

Investor Impact per dollar = Gross Impact per dollar x Enterprise Contribution x Investor Contribution 

Gross impact per dollar is the amount of outcomes per dollar size of the enterprise, without adjusting for the contributions (i.e. additionality). We think of this as another multiplication: ‘Number of stakeholders’ x ‘Amount of change experienced’ x ‘Length of change’ x ‘Importance of the change’. We assess these components individually; we find it is helpful to be more granular and specific than  the standard ‘What’, ‘Who’ and ‘How Much’.

So, we are basically all about multiplication! That said, if an investment involves multiple impact pathways (e.g. both from our capital and our engagement, or impacts on different outcomes) we would do separate multiplications for each pathway and sum these results together. 

Another key feature of our impact math is what we call ‘scale-sensitive’ scoring. For simplicity, we like to score each dimension on a score of 1-5. But we realize that the actual values 1-5 do not represent how much better we think better scores really are. So, before we multiply we convert our simple scores into scale-sensitive values. The conversion depends on the dimension. See examples in the attached tables.

The result is that our ratings have plenty of dispersion. Both because we multiply and because we use scale-sensitive scores.

In the past, instead of scale-sensitive scores, some of our collaborators have stuck with simple scores. Even if they agree that the actual difference between a “3” and a “4” may be 2x and not just “1 point”. But, we have found that this makes it easy to forget the dramatic dispersion in investor impact that we observe between opportunities. It makes it easy to spend time chasing an opportunity you wouldn’t have prioritized with scale-sensitive scores.

We assess enterprise contribution and investor contribution in terms of dimensions that we call “Scalability” and “Neglectedness”. 

  • Scalability is how productively additional resources can be used right now – it’s no good throwing more money or time at something if there is some other bottleneck.

  • Neglectedness is the extent to which the supply of resources to an opportunity is limited. It is inversely related to how much money the market is allocating to an enterprise or industry, either because of risk, lack of awareness or other frictions. 

  • For enterprise contribution, we assess these dimensions for the industry or area to which the investee organization is contributing. For investor contribution, we assess them for the investee organization itself.

We aim for our contribution scores to be conservative. We use percentages between 0-100% (though if we thought we were having really large or negative impacts we could go outside this range). This makes the ultimate rating scores often quite small. But we like this because it reminds us to be humble about how difficult it is to generate impact. We could multiply the end ratings by a ‘normalization’ factor if we were worried about legibility (e.g. ratings not fitting into a common range like 0-10).

Note that our ratings math doesn’t require ‘weights’. We like this because then we can focus on debating the scores on each dimension instead. That said, we have also experimented with different team members assessing their own scores for an opportunity and then taking a weighted-average of these scores, weighted by the relevance of each analyst’s expertise – though we don’t yet do this routinely.

We don’t directly include impact risk in our ratings math. Instead we assess the lower & upper overall rating we would plausibly apply to an opportunity. This gives us an uncertainty range for each opportunity. Then when comparing between opportunities we compare both the main estimate, and their uncertainty ranges. If one opportunity’s entire uncertainty range is above another’s, then the decision is easy. If there is a lot of overlap then we need to be more careful. We find using these ranges to be a really useful tool for directing our conversations. 

These ranges also highlight one of the fatal flaws in weighted-averages, from our perspective. Because the ‘reversion to the mean’ effect means that not only do great opportunities stand out less, but that with weighted-averages almost all opportunities overlap in terms of their uncertainty ranges.

By the way, we also apply similar ratings math at the industry/sector level to help us prioritize which areas we target in our search for opportunities. We are considering doing this for SDG sub-goals and we’d be happy to talk with you if you’re interested in collaborating on that.

I’m attaching two example tables and a separate visual representation of the difference between multiplying and adding when constructing impact ratings (from an internal exercise).

Excited to see these topics getting discussed!

Laura Mixter
ESG & Impact Reporting Lead @ LISC
July 2022

Getting the math right was definitely one of the trickiest parts of developing LISC’s Impact Matrix. We took a hybrid approach of a few of the ideas above. As discussed in the article, our matrix has six dimensions. In order to determine the relative weight of each one in the final score, we surveyed LISC staff on what dimension they thought were most important and weighted them accordingly – Community-Centered and Effectiveness are both worth 30 points each, Who and Alignment are each worth 20 points and Contribution and Risk Mitigation are 15 points apiece. We then had staff rate the indicators within each dimension similarly to assign weights within each dimension.

The final score is presented both as a cumulative total out
of 130 points and also as a radar graph to show how a project performed
relative to the portfolio median on each dimension.

About this discussion
Impact Ratings Math
This discussion board provides a forum for exploration of different ways to calculate an expected impact rating for prospective investments.
“What can impact investors learn from evaluators (and vice versa)?”
Impact Ratings Math
Investor Contribution 2.0 – Discussion Forum

Subscribe to Our Newsletter

Skip to content