The PBRF challenge

Part 2 – designing an alternative

At the end of the first article of this series, I took on the challenge. How can research funding be modified so as to build on the successes of the PBRF, while also simplifying it, reducing the “back-breaking” compliance burden. How can we create a new method for handing out research funding, a method as robust and credible as the PBRF, with as much integrity as (or even, more integrity than) the PBRF, an approach that lifts the research performance of the system as well as the PBRF has – but simpler, less costly, less intrusive.

Preserve and build on the successes of the PBRF. From the analysis in article 1 of this series, it follows that what we need is:

First, designated bulk funding for research, funding that allows institutions freedom to decide how to apply the money, to advance its institutional strategy for research
Second, research funding that rewards and enhances the three principal values of higher education research identified in article 1:
- research that builds human capital, that creates graduates with highly developed critical and enquiry skills who have the ability to boost innovation in the workforce and help firms build absorptive capacity
- research that contributes to our society – to the culture, to community development, to the economy
- research that informs and shapes learning, that uses enquiry as a teaching method and an assessment approach, thus ensuring that learning is current and exciting and helps build transferable skills
Third, a broad and inclusive definition of research, valuing applied research and practice/clinical research, as well as pure, leading-edge and blue skies research, that embraces multiple forms of knowledge creation, such as design, fine arts, musical composition, performance and literary creation, as well as scientific journal articles
Fourth, research funding that reflects scale – the larger an institution’s research work, the greater the funding, all else being equal
Fifth, funding that reflects research performance, research quality.

All that, and, at the same time, a system that minimises the compliance burden.

Measuring research

In part 1 of this series, I noted that the PBRF already has proxy measures of the human capital effects of an institution’s research. And of the contribution to society. The problem areas are scale and performance.

Scale may sound like something that lends itself to precise quantification – we need only count the number of research active staff. How hard can that be? Except that … it involves assessing which individuals are research active and which are not, an assessment that involves drawing a line between activities that qualify simply as scholarship and those that cross the boundary to research. We need to make a qualitative judgement before undertaking a count.

In the PBRF, the quality evaluation (QE) panels establish which of the staff put forward by the institution is research active. Meaning the QE is used to determine the scale of institutions’ research effort. And the QE is also used to assign each eligible researcher to a quality category. So it is the QE that is used to rate performance.

But it’s the QE that is at the heart of the problem of compliance burden. It’s the QE that requires: each academic to compile a detailed portfolio; each institution to review, critique and assemble the portfolios; the assessment panels to pore through those many dozens of portfolios and assign quality categories, often imperfectly; and the TEC to create the systems, engage the panellists, oversee the assessment and compile the results.

It is the purpose of this article to explore ways to reduce the burden. So it’s the QE that we need to address. We need to look for simpler but robust ways to assess scale, quality and performance.

So the challenge of simplifying the PBRF comes down to two questions:

First, how can we assess the scale and performance of an institution’s research in a simple way, without a high compliance burden ie, without using the QE?
Second, having worked out how to assess scale and performance, how can we use those assessments, together with measures of the human capital and the societal contribution dimensions of institutions’ research, in order to provide a sound basis for the allocation of some hundreds of millions of dollars of public funding.

How we measure …

Measures are broadly of three types …

metrics that precisely quantify or measure what it is we are assessing
assessment by independent experts who are capable of making judgements
proxy measures – that is, where we choose something that is measurable and that gives us an approximation to a measure, something inexact but that is related to what we want to measure, something “close enough” that guides our judgement.

Complex activities don’t usually lend themselves to precise and exact measurement. Research is no exception. So precise measurement is out. Expert assessment – peer review – is at the heart of most research evaluation – deciding whether a paper is up to inclusion in a journal and whether a team’s project is good enough to be funded. The QE – the way the PBRF currently measures scale and quantity – is an example of expert assessment. But the whole purpose of this exercise is to find an alternative to the back-breaking QE.

So, if the first two are out, we need to use the third way. Our plan has to be to find proxy measures – and in particular, proxy measures for scale and performance.

My suggestion …

My suggestion is for a four-component model. Research funding should be allocated using a formula that takes account of

Research degree completions
External research funding
A measure of the scale of the institution’s research activity
Assessment of a small number of research impact case studies.

The first two are status quo, while the third and the fourth would replace the QE.

The scale measure and the impact case studies would have to derive the same sort of information but by quite different means. They would need to be simpler and less obtrusive.

The four would be weighted, just as they are now, in a formula that would then split the funding up between the participating institutions. I make no proposal for those weightings … that is a design element that needs to be considered in light of the different incentives that each component creates. Weightings should aim to balance those incentives, so as to mitigate the risk that institutions prioritise (say) scale over impact, blue-skies enquiry over practice-based research. And there should be simulations to test any proposed set of weightings.

How would scale and impact be assessed? What has to be traded off to enable that simplification? Read on ….

The scale measure …

One possibility is to allocate shares of the scale funding simply on the basis of the number of academic/research staff in the institution – that is the solution advocated by Scottish academic Mark Reed, founder of the research training firm Fast Track Impact [1]. But that’s not practical – it would incentivise game-playing. Therefore, it would entail audit and it would require institutions to provide evidence that the staff member is research active – especially important as there are many non-research academics at polytechnics and PTEs. Reed’s solution invites a QE-lite process.

Here is an alternative … What I am about to suggest as the scale measure is simple. It is robust. It is practical. It is efficient. It is focused on the institution, not the individual. It is a pure measure of the relative size of institutions’ research programmes. It suits some types of institutions better than others. It is imperfect but workable.

My suggestion is a count of each university’s research publications over a given six-year time period, as indexed by one of the major research index systems[2], using fractional counting[3].

Yes, I did intend the word university. This proposal works well for universities, all of which produce some of their research for publication in indexed journals, but probably wouldn’t work for other institutions which are less likely to produce a significant number of indexed publications over a sustained period – see below for the proposed approach for non-university participants.

I am aware of the limitations of this suggestion. I am aware that it will be seen as controversial – that it will be welcomed as obvious by some, but that it will be horrifying to others. I am aware that it was rejected by the original working parties. I am aware that smaller institutions with a narrow research base will feel aggrieved.

But hear me out.

There are advantages in this approach …

Counting indexed publications gives a sense of scale; it is a proxy for the size of a university’s research effort.

Indexation means that the papers have been subject to independent peer review as part of the process of being accepted by a journal. In other words, this proposal builds off expert assessment made by disciplinary specialists. That is a guarantee of a minimum quality standard.

It allows disaggregation by broad field of study, enabling funding to be weighted higher for fields of research that are capital-intensive.

A six-year time period matches the current PBRF cycle. It also reduces the potential risk of instability of data.

This proposal has low compliance cost – the peer reviewers and the journal have borne that cost; we can make use of that investment. It is also cost effective, relying solely on the purchase of a database licence and some standard analytical tools. See the Appendix for an example of how this would work. (That table took a few minutes to compile from Clarivate Web of Science data curated by Leiden University CWTS).

… and obvious disadvantages …

A narrow conception of research

The biggest challenge is that the research scale funding in universities would be driven solely by one type of research.

In Part 1 of this series, I gave the example of a hypothetical family law academic who eschews international journal publication in favour of local, low-status journals that are read by judges and that, therefore, generate greater impact. I also used the example of clinical research (published in journals like Practical Neurology[4]) that may influence medical practice. I might also have mentioned Mātauranga Māori research – a field enormously important in Aotearoa NZ but that would mostly appear in local journals that fly beneath the radar of the indexed journals. And then there are the other forms of knowledge creation – performance, artistic creation… A whole slice of the knowledge creation output of the system is uncounted in the bibliometric databases. These are all valuable and important parts of the system; does this proposal distort the balance?

There are two rejoinders to the criticism:

The bibliometric analysis is being used solely as a proxy for the relative scale of the universities’ research effort. It makes no claim to being comprehensive.
It influences only one driver of the research funding. Any distorting effect is moderated by the presence of the other measures (such as research degree completions). See also the comment on constructing the formula below.

That said, there is a risk that a university might push its academics to submit papers to indexed journals in preference to other, local journals. I doubt, however, that the family law academic discussed in Part 1 and referred to above would be prepared to abandon the intention to influence judicial thinking in favour of a focus on theoretical research that would appeal to the editors of the Columbia Law Journal. And I am quite certain that researchers who focus on Mātauranga Māori would continue to focus on Mātauranga Māori!

The breadth of institutional focus

One matter that may concern some vice-chancellors is that, if a substantial share of an institution’s knowledge creation activity is in creative fields like fine arts and performance and/or in Mātauranga Māori and/or in applied or clinical research, then that institution may consider that its number of indexed papers (and hence, its scale reading) will be understated relative to universities with a portfolio exclusively oriented towards sciences, humanities, health sciences and social sciences.

The response to that is two-fold:

Every one of New Zealand’s universities has a substantial share of its knowledge creative activity in fields that don’t lend themselves to publication in indexed journals. Any could make this case. They are all affected to an extent by the bias in the measurement. But they all have substantial research activity where the primary aim is to win publication in an indexed journal.
As noted above, this is only one of four components to the proposal. If the weightings of the four components are carefully set and the model thoroughly tested before finalisation, perverse incentives will be reduced. Reduced – elimination is impossible.

Slicing and dicing

Making a simple count of indexed papers the metric for this proposal carries another risk – that a piece of research could be cut into several “slices”, increasing the number of papers for the same amount of research. Maybe. That assumes that the journal peer reviewers were not to ask the obvious question: why didn’t you explore these other three angles? unaware of the fact that the researchers had explored exactly those three angles but were holding them back to bolster the paper count. It assumes the peer review process used by the journal is not especially robust. It assumes that researchers are motivated by earning funding for the institution, rather than by the professional satisfaction that comes from having a really excellent paper accepted[5].

In fact, some critics are of the view that the QE already creates precisely that incentive. In an interview with Politik in 2023, the newly appointed science minister, Hon Ayesha Verrall, said that the QE created perverse incentives for academics. She said: … one big piece of work can be cut into ten tiny papers, none of which are really worth reading by the end. She told Politik that ten short papers under the PBRF regime are worth considerably more than one substantial paper.

It’s easy to dismiss Verrall’s comment as simple hyperbole. It is, prima facie, a misreading of how the QE was intended to work and how it has played out in practice (an unfortunate misreading since it came from a minister!) Frankly, it is implausible that peer reviewers for indexed journals are prepared to recommend for publication papers that are “not worth reading” [6].

Of course, slicing up research is a risk to the integrity of this proposal. That risk could be mitigated by including other bibliometric variables (such as citations or weighting counts by the journal ranking). Complicating the proposed simple count of papers with weightings and additional variables would be unfortunate; my suggestion is: stick with simple counts and monitor trends. If the risk is realised, add a citations measure.

What about polytechnics? Wānanga? PTEs?

There are likely too few indexed papers by higher education researchers outside the universities for the proposal outlined above to work; with a very small number of papers, the funding would be small and unstable. My suggestion is to assign a fixed proportion of the scale funding pool to the participating TEOs outside the universities, using as a starting point the historic share of the PBRF earned by those sub-sectors[7].

Then allocate that pool among the participating providers according to the number of Level 7+ EFTS.

The rationale for this suggestion is that the need to provide research funding arises from section 268 of the Education and Training Act 2020, which lays out the expectation that, at degree level, much of the teaching is undertaken by those active in research.

In addition, the three wānanga receive additional research funding outside the PBRF – that funding should continue, in addition to the share of the scale funding (and other components of this proposed new approach) earned by application of the formulae and metrics proposed here.

So?

Like all funding system design, this proposal is a trade-off. Simplicity versus breadth. Simplicity versus precision. A trade-off worth making? In the eyes of many, yes.

An impact measure

One of the three important values of research in higher education is its contribution to our culture, to community development and to the economy. While not all research projects make a contribution of that sort, it is important that institutions’ research programmes as a whole create societal value. Demonstrating societal impact is an important means of creating social licence, a justification for taxpayer subsidies[8].

But impact is complex and can be hard to determine. We have to recognise that knowledge advances in tiny steps. One research group investigates a cellular mechanism in mouse brains, leading another group in another university to conduct MRI scans of human brains to determine if that mechanism might apply in human brains, leading another group in a third university to …. until, in step 17 of the sequence, years later, a new therapy is created. The impact, the new therapy, is a consequence of all those steps. Also, it’s not always easy to assess the long-term impact of research. For example, the significance of Roy Kerr’s solution to Einstein’s challenge wasn’t really understood for a decade[9].

In other words, some research produced in institutions will lend itself to impact analysis; much will not.

Currently the PBRF has two (imperfect) mechanisms for looking at research impact.

Impact is implicit in the external research income measure, in that external funders of research are, for the most part, keen to advance particular societal – cultural, community, business etc – interests. They pay because they hope that the research projects they fund will create change and generate benefits.
Also, comments on impact are often included in the research portfolios submitted by individual academics during the QE.

So, if the QE is removed, there is a case for a more explicit impact assessment. The UK, facing this same issue a decade ago came up with the notion of impact case studies. Each institution participating in the Research Excellence Framework (REF) assessment is required to produce a number of case studies in a standard (simple) format[10], with the number of cases required linked to the scale of the institution’s research activity – in the 2014 REF, the number of impact case studies ranged from two (for example, the Royal Agricultural University) to nearly 300 (University College London). These are all assessed (on their reach and their significance) and they are scored, with the assessment counting for 25% of the overall score.

In 2014, there were more than 6,000 case studies across the whole of the UK! In Scotland, the UK country closest in population size to NZ, 808! That’s an awful lot of assessing – it’s hard to imagine just how the assessors were able to split so many hairs so precisely in an exercise that carries so much financial significance. It makes the QE look like a breeze.

Nonetheless, there is the germ of a great idea in this.

There is a case for requiring participating institutions here to do something the same, but on a radically different scale and with strict limits (similar to those used in the REF) on the length and scope of cases. Say, one case study each for the participating institutions outside the universities, and from (say) three to eight, depending on scale, in the universities. And don’t attempt to score the cases; make submission of a satisfactory case study as threshold condition. In other words, a mandatory requirement for research funding would be submission of the required number of impact case studies, which might then be presented to, discussed with and questioned by a small group of independent peers. Pass or fail.

The other two components

The other two components – external research income and research degree completions are both valuable. They also work well.

RDC

This gives a proxy measure of the contribution an institution’s research work makes to developing advanced conceptual, critical, enquiry and analytical skills that contribute to the country’s human capital and to the innovative capacity of workplaces[11]. The administration of the component and the collection of data to feed the process are both well-established and effective.

ERI

Any external research grant, from a commercial firm, from a NGO, from a government agency or from a government research fund (such as the Endeavour Fund, the Marsden Fund or one of the HRC funds) follows an assessment of the researchers’ ideas, their capability, their track records, their skills. So when a higher education researcher or research group wins a research grant, that is a proxy for quality – the system is taking advantage of the fact that someone else has made the assessment. It is also a proxy for likely short-term impact; the funder won’t want to hand out funding unless there is a likelihood that the investment will show a return in reasonably short order.

The 2012/13 review led to higher weightings being set for non-government and overseas-sourced ERI and these weightings were increased following the 2020 review. I don’t think those were good moves in that they carry the implication that public good research and advice to government agencies is of lower status than commercially oriented research. Yet government’s research funds are very tightly contested; to win a Marsden, Endeavour or HRC grant is a mark of quality.

The ERI data collection, modified following the 2012/13 review of the PBRF, appears to work.

I would recommend keeping both the ERI and the RDC in their current forms.

Creating a funding formula

So we have four possible components. One – the impact assessment – should be a threshold condition, a prerequisite for participation. The others need to be weighted so they can be combined. I am not going to suggest weightings. But I would caution against weighting any of them too heavily. In particular, if the weighting of the scale factor was too great, it would create incentives to focus heavily on the production of one type of research, namely indexed papers. A starting point might be to weight the three equally.

But there needs to be sensitivity testing, simulations to check if and how funding would move.

Balance and trade-offs

The PBRF has transformed university culture in Aotearoa NZ. Research now has greater focus. Research performance has grown. That needed to happen. The PBRF provided the trigger.

Has the pendulum swung too far towards research (and therefore, away from teaching)? From an outsider’s viewpoint, there now appears to be greater innovation in teaching than twenty-five years ago. And there is a much-enhanced focus on learner success, on supporting learning and on recognition of disparities in learning. I am not convinced that the pendulum has swung too far.

As for the incentives to lift research performance …. There is no perfect way of allocating research funding. Every approach is a trade-off – complexity, comprehensiveness, precision, incentivisation and compliance cost against simplicity with relatively less precision. If, as so many argue, there is a diminishing return on the PBRF, then we need to look at a simpler approach, while retaining the strengths of the PBRF, namely, its recognition of the societal value of research and the incentives it creates for academics to do great work and to strive for better.

I worry about privileging one type of research in the new scale measure. I worry about whether this proposal gives due weight to areas like Mātauranga Māori. And to institutions outside the universities. But, equally, I worry about the status quo, whether the QE panels, facing so many detailed profiles, are really in a position to assess them all without reverting to old, traditional paradigms. I worry about the future of a process that has created such rancour – its survival has to be at risk.

No one can say for certain whether what I suggest in this article hits the mark, whether it would be better in practice than the PBRF, or better than a flat undifferentiated fund based on the number of research active staff.

But, if nothing else, this analysis in this article is a recognition of the wickedness of the problem.

Bibliography

Follow the link above ….

Endnotes

[1] See Reed (2024). Reed is also professor of rural entrepreneurship at Scotland’s Rural College.

[2] There are two main commercial research databases which index nearly all the main research journals: Clarivate’s Web of Science (WoS) and Elsevier’s Scopus database. Refer to the Appendix for a short discussion of the differences between the two. See also Cascajares et al (2021), Kumpulaien and Seppänen (2022) and (especially) Pranckuté (2021) for a discussion of the relative merits and the complexities.

[3] Fractional counting means that, if a paper’s authors are from more than one institution, each of the contributing institutions will receive a fraction of the credit for that paper. So if a paper has six authors, three from University X, two from University Y and one from an Australian university, then X’s contribution will be 0.5 and Y’s 0.33, while the remaining 0.17 is ignored for the purposes of counting the NZ universities’ shares.

[4] Actually, Practical Neurology is indexed both in Scopus and the Web of Science. The WoS rates its Impact Factor as zero – meaning the articles published in the recent past have attracted very few citations. While the journal’s bibliometric impact factor is low, that doesn’t mean that those articles haven’t made a difference to practice. What the WoS means by impact is at variance with the everyday understanding of the term.

[5] And note that under this proposal, unlike the QE, the unit of analysis is the institution, not the individual researcher, so the incentive to play that game is lower.

[6] Verrall’s comments in the Politik interview were certainly intemperate; ministers don’t usually declare an intention to reshape policy in another minister’s portfolio, a policy they have little understanding of and on which they have received no advice. What is probably more concerning for universities in her Politik comments was her determination to reshape higher education research funding to ensure it supports government research strategy. That is at variance with the tradition of (and statutory requirement for) institutional autonomy and academic freedom. Of course, Verrall at the time, was a novice MP and was new to her portfolio. One hopes that she takes more care if and when she next makes Cabinet!

[7] In 2019, taking account of the most recent (2018) QE as well as the RDCs and ERI, the universities received 96.2% of the $315 million PBRF funding. If that figure was roughly equal to the average over the life of the PBRF, it would mean, under this proposal, that 3.8% of the research scale funding would be split between the participating TEOs outside the universities. In 2016, the distribution was even more skewed to the universities – 97.3%. Refer to TEC (2019) and (2016).

[8] Ayesha Verrall’s comments to Politik, quoted above, simply reinforce this point. Her suggestions that the research funding system be readjusted to align to the government’s strategy can be seen as an expression of her frustration with the prevalence what she perceived as low-impact research. But we all need to recall that what is blue-skies research today may trigger a future breakthrough.

[9] Something not helped by the fact that he had retreated to the University of Canterbury, well out of the limelight, and had acquired a taste for competitive bridge.

[10] Cases are presented on a standard template that limits the narrative (in three sections: summary of impact, underlying research and details of impact) to no more than 1,350 words and limits the number of references to 16. No graphs, no photos, no illustrations. Case studies end up at no more than four A4 pages.

[11] See Arara et al (2023) The effect of public science on corporate R&D.