Collective Rationality in Evaluation

13 min readMay 11, 2024

Chapter III.2 from the new working paper of the Slovenian Evaluation Society

Key words: Outcome Harvesting, Sensemaker, Causal Mapping, Most Significant Change

Also available here

Summary: Participatory evaluation struggles to aggregate diverse individual impacts of complex interventions into summary indicator of impact. Traditional micro-to-macro approaches prove inadequate. This section explores constructivist methods emphasizing meso-level synthesis focusing on intermediate phenomena (such as authentic middle, the empty middle, relative thirdness…). This conceptual shift from micro-macro to micro-meta-level synthesis unlocks the full potential of participatory evaluation. It fosters a form of collective rationality that doesn’t necessitate a trade-off between inclusivity and collective rationality in evaluation. Meso-matrices and Venn diagrams are suggested to operationalize a new concept of synthesis and collective rationality.

Another imperative of participatory evaluation in collective choice is to foster its rationality. The tool achieves a higher degree of collective rationality when it consistently facilitates more integrated decisions regarding collective goods, drawing from a given set of contributions of individuals and groups, compared to comparable approaches. Then the effectiveness of the tools should be assessed based on their ability to address the aggregation problem, specifically, how successfully they synthesise fragmented initiatives into a coherent collective decision. Biased aggregation methods produce constrained results, either restricted, selective, or partial. They lead an evaluation to suboptimal results, and inferior decisions that hinder collective learning and significantly disadvantage the collective good. Sub-optimal aggregation erodes trust, causing community disengagement and reluctance to cooperate in communal endeavours.

The terms ʻaggregationʼ and ʻsynthesisʼ are used interchangeably to refer to the process of ʻaggregative synthesisʼ (Munn et al.). The aggregation means accumulating data from multiple sources into a unified dataset; it enables synthesis — creating a novel understanding of complex issues from the aspect of the whole by making sense of the relationships woven between the individual parts.

Participatory evaluation faces the aggregation problem, which stems from the multifaceted nature of collective choice. The problem lies in effectively accounting for the wide range and diversity of inputs received during participatory evaluation while providing a unified understanding of the collective outcome (Scriven, 2003). Aggregative synthesis allows for the diverse construction of a unified and consistent whole. Various approaches to the construction of wholes exist, such as the hierarchical structure of the MSC and OH, the horizontal network pattern of the CM, or a combined vertical-horizontal structure in the SM.

Another factor contributing to the aggregation challenge in evaluation, as Scriven (2003), a renowned British-born Australian polymath and academic philosopher emphasizes, is that its logic must align with the conceptualization (of the unity) of the whole — the evaluation object, its scope, and the underlying theory of change (Creswell). For the evaluation of simple interventions, the methodology of aggregation is simple. It is typically descriptive and quantitative, employing precise, definite, linearly additive commensurable inputs. The scope of simple aggregation is summarizing, averaging, or identifying the frequency of specific values or data points within a dataset. Conversely, evaluating complex interventions necessitates complex aggregation methods. These methods handle incommensurable inputs and involve comparative, indirect, and non-linear data. The evaluation goal is to identify characteristic patterns, contradictions, or novel meanings hidden in the aggregate. Therefore, selecting an aggregation approach in complex conditions is a creative and context-dependent act requiring careful justification.

Policy impact evaluation has historically demonstrated a perplexed relationship with the need for evaluative synthesis. A central point of contention is the absence of a unified perspective on the role and methodology of aggregation synthesis within the evaluation process. Lack of consensus has led to deep divisions in the evaluation field splitting it into two separate traditions — non-aggregative and aggregative approaches or micro and macro-level evaluation.

Leopold et al. developed a detailed method for matrical impact assessment at the micro level. They believed that the aggregation of fragmented findings into policy-relevant conclusions at the macro level must be avoided as it requires value judgments. The evaluatorʼs primary task is to inform and comment on specifics rather than generalize. Leopold argues that refusal of aggregation is necessary for neutral evaluation, as it creates a boundary between the evaluator and policy-maker, protecting the former from political interference.

Since Leopold, many evaluators have refused aggregation of findings. The Impact Assessment Board (IAB),[1] which advised the European Commission, found that most evaluation studies failed to provide policymakers with useful information at the macro level. Hageboeck et al. observed the same in their meta-evaluation of 340 evaluation reports prepared for the United States Agency for International Development, as well as Huitema et al., who evaluated the quality of synthesis in 259 evaluation studies commissioned for the European Union climate policy.

Detailed assessments certainly help better understand detailed impacts at lower levels of the organizational hierarchy. However, they do not make decision-making easier (Diamond). When findings are not aggregated, evaluation studies produce an information overload. Their findings fail to capture the complex reality that policy-makers operate within and only provide banal answers to policy questions (Virtanen, Uusikylä).

The rejection of summation in evaluation, and shifting the responsibility to policy-makers, program managers or other key stakeholders, assumes that they can perform the task neutrally and inclusively. This, however, is difficult to justify (Stiglitz et al.) due to their bounded rationality (Simon) and vested interests (Crano). For this reason, Scriven (1994, p. 378) highlighted that rejecting summation in evaluation means “letting the client down at exactly the moment they need you most” — when they need to make sense of opposite findings and of contradicting arguments at the collective level. He warned that rejection of aggregation only exposes evaluation results manipulation by the included (privileged) minority.

The second tradition, as mentioned, encompasses aggregative evaluation approaches. The customary approach aggregates fragmented evaluation findings, from the micro to the macro level of a composite indicator of the interventionʼs overall success, as if they were commensurable. Here, input data are linked to the aggregate through a common metric, such as euros, tons of CO2 emissions, or a number of votes. The problem arises in qualitative research when dealing with incommensurable data. Furthermore, aggregating must go beyond routinely cumulating commensurable inputs and intersect quantitative with qualitative data in complex situations. Any longer is aggregation ‘a routine process of drawing general lessons from local projects; one must also take into account how power, ignorance, and framing play a role’ (Geels, p. 646). Various aggregation approaches exist and reflect distinct ethical and political concerns, values, preferences, and interests of stakeholders. This diversity mirrors the stakeholders’ differing understandings of societal structures and their power dynamics.

The customary simple aggregation methods are too constrained. Evaluation of complex interventions calls for a more comprehensive approach to synthesis. Decades ago, a constructivist paradigm of synthesis emerged as a solution. Its epistemology moves beyond a purely rational model. It embraces inclusive design-based models and outlines subjective and contextual explanations for complex interventions’ effects.

Emerging within the fourth generation of evaluation approaches (Guba, Lincoln), constructivist evaluation highlights collective sensemaking through dialogue and reflexive processes. Meta-narrative synthesis (Flanagan), a renowned constructivist approach,[2] is explicitly referenced in the CM. Instead of aggregating data points quantitatively, narratives are synthesised thematically. The meta-narrative evaluation explores the contradictions and correlations between findings rather than seeking uniformity between them. It acknowledges the influence of diverse research contexts and disparate theoretical perspectives on the data. While identifying contradictions between them is a crucial step in the evaluation, the meta-narrative approach also outlines their intersections, leading to a more comprehensive understanding.

Consensus-building is another prominent constructivist approach employed in evaluation synthesis (Hill et al.). It emphasizes bringing together participants with diverse perspectives to collaboratively discuss a specific issue and arrive at a mutually agreeable understanding. It utilizes facilitated dialogue to identify key themes while acknowledging the existence of disparate interpretations. Structured discussions, often organized hierarchically, guide the synthesis process. Voting might be used to reach a final decision, as is the case when adhering to the ‘One Person — One Vote’ principle.

Another way constructivists tackle this issue is the Qualitative comparative analysis (QCA), followed by the CM. QCA identifies the core themes and questions emerging from the qualitative data. Researchers interpret the results and conclude the causal relationships or patterns observed in the data. Synthesis can be achieved retrospectively using the ʻgrounded theoryʼ approach, which entails searching for a theoretical framework that best explains the data after it has been collected and analyzed.

In contrast to previously mentioned constructivist approaches, the responsive evaluation, introduced by Stake in 2003 and followed by OH (Beardmore et al.), prioritizes stakeholder concerns over predetermined objectives or indicators. It underlines methodologies sensitive to the cultural backgrounds of participants. To identify recurring patterns within the collected data, the evaluator employs triangulation, corroborating findings from multiple case studies.

Additionally, a constructivist lens in evaluation suggests participatory action research (PAR), introduced by Freire. It is followed by the CM (Copestake et al.), stressing the active participation of community members in all stages of the research process, including data collection, result synthesis, and meaning interpretation. The evaluatorʼs task is bridging between research, policy, and community, acting as a facilitator, communicator, and advocate, fostering a participatory environment that empowers social change.

Besides, the OH incorporates several utilization-focused evaluation principles (Patton, 2002). This constructivist approach prioritizes identifying and engaging with key stakeholders who will utilize the evaluation findings eventually. The evaluation is then designed to address their specific information needs and preferences. Synthesis of findings occurs during collaborative meetings or workshops. These sessions facilitate discussion and prioritization among key users, ultimately informing action planning based on the evaluationʼs insights.

While none of the four aforementioned tools explicitly employ the dialectical method, the value of this method in evaluating complex problems is recognized by many prominent evaluation scholars. Patton (2010) highlights the relevance of a dialectical perspective within evaluation methodology, as it acknowledges the innate uncertainty and indeterminacy of qualitative data. In todayʼs complex societies, competing demands and conflicting viewpoints are inevitable. The dialectical method offers a critical lens for examining opposing positions, exposing their limitations and inconsistencies (Stake, 1998). Proponents like Dick[3] posit that dialectical synthesis can identify synergies and lead to win-win solutions for all stakeholders, unlike majority voting which creates winners and losers. Guba and Lincoln stress how the dialectical perspective empowers stakeholders to construct their understanding of reality through a cross-sectional synthesis of opposing viewpoints and a dialectical interpretation of the resulting interconnected findings. The dialectical method is thus particularly relevant for participatory evaluation as a deliberative process.

Dialectics is a meta-theoretical hybrid method. It transcends traditional divisions within epistemology intersecting critical analysis with the constructivist formation of antagonist viewpoints. This emphasis is particularly characteristic of dialectical constructivism, an epistemological framework grounded in an intersectional synthesis. The philosopher Charles Sanders Peirceʼs work exemplifies this framework (see below). The core ideas of dialectical constructivism can be operationalized through a meso-matrix (Radej, 2021a): a square matrix designed in a moderate span (at least three independent domains; Simon) — located between small and large numbers of domains.

Constructivist approaches to evaluation discussed above share some key features, most notably the rootedness of their synthesis approaches at the meso-level. A meso-level prioritizes the examination of social phenomena at intermediate levels, such as groups, sectors, domains, and thematic areas. Meso-level evaluation underscores the importance of deliberation, facilitation, interpreting diverse narratives, and investigating the construction of shared meanings within these social contexts. Eleanor Chelimsky, a prominent American evaluator called for a middle-ground approach to evaluation that acknowledges both the specific context of individual programs and the broader organizational or societal environment. Similarly, Scriven (1991) proposed a pragmatic approach to evaluation situated at the meso level, bridging the gap between micro and macro levels. Examples of meso-level methods are network analysis and synthesis, matrix models, causal models, clustering, or meta-synthesis.

The four tools without exception share a lineage within the meso-level tradition. The authors of the MSC stress the importance of evaluating intermediate impacts. The CM and OH similarly demand focus on meso-level evaluation, such as participant-related structures, routines, or sub-maps. Unlike traditional outcome-driven approaches, the OH, MSC, and CM do not primarily aim to identify interventions’ results. They focus on changes within the process, such as behavioural shifts, process drivers, emerging patterns, contextual influences, or modifications to standard procedures. The SM also aligns with this perspective. It employs a triadic structure for meso-level evaluation, encompassing three distinct and context-dependent evaluation domains.

Various aggregation procedures exist at the meso-level of evaluation and their collective rationality varies significantly. Representative cases are a mixed-methods approach, ʻmid-rangeʼ or ʻmulticriteriaʼ methods intersecting quantitative and qualitative data or methodologies. They refuse to address broader, abstract concepts such as ʻsocietyʼ, or ʻsocial cohesionʼ. This means they are originally not meant to responding questions about what is best for the collective or society.

Customary meso-level approaches fall short of an authentic mesoscopic way of reasoning in evaluation. Authentic mesoscopic reasoning acknowledges the existence of opposing poles (Dopfer et al.) in evaluation, a binary construct such as Option A vs. Option B. The synthesis of opposing viewpoints necessitates the introduction of a third, intermediary category that integrates elements of both perspectives. For instance, socio-economic development serves as an intermediary category arising in the intersection between the economic and social domains of sustainable development. An authentic intersectional perspective can empower evaluators to reframe fundamental oppositions from a dualist stance of antagonism to a triadic middle-ground perspective.

A critical distinction exists between absolute and relative thirdness since it separates divisive (inauthentic) from integrative (authentic) mesoscopic structures. The triadic relationship is not constructed here from three opposing domains (A, B, and C — absolute thirdness). Instead, it is formed by two opposing domains (A and B) with a third, intermediary category (ab) that is relative to them. The triadic presentation of the middle ground allows for the coexistence of oppositions while preserving the fundamental distinctions between them.

Authentic meso-level evaluation hinges on this concept of relative thirdness (A, ab, B). According to Charles Sanders Peirce, famous and highly original American logician, philosopher, and father of the theory of sign, relative thirdness arises between at least three pairs (!) of domains. Not between three individual domains — only dyadic structures exist between different domains.

Peirceʼs philosophy posits that phenomena can be apprehended in three distinct basic forms: in firstness as independent quality, in secondness as opposition, or thirdness as mediation. Firstness governs absolute qualities; secondness oversees relative forces and thirdness governs mediation between oppositions. None of the three forms of logical reasoning has an absolute advantage. Firstness is about the counting of commensurable contents, secondness is about division, and thirdness is about multiplication. All three are indispensable. The monistic firstness is appropriate for classification, logic, and mathematics. The dualist secondness is suitable for causality, dialectics, and correlation. The thirdness is prescribed in actions that generate integrated meaning from incomplete and inconsistent inputs (Peirce).

From a triadic frame of relative thirdness, Peirce elaborated the concept of ʻsecondness of thirdnessʼ. Secondness represents opposing poles, while the triadic element functions as an integrator. The secondness of thirdness bridges the divide between two-part and three-part reasoning. Through secondness of thirdness, observer sees the world in a three-dimensional perspective between the dialectical relationships between domains through which the world is evaluated at the meso-level.

Relative thirdness fosters the most authentic middle-level reasoning. It can be formalised with a meso-matrix. This is a square matrix. It contains at least three evaluation domains (like the economic, social, and environmental domains of sustainable development), organized by three rows presenting complex interventions and three columns presenting assessment criteria. Secondness serves to organize and correlate dyadic relationships between pairs of domains. Thirdness finally interprets the obtained results of meta-overlap between correlated pairs (which correspond to the concept of ʻMeso 3 sublevelʼ in Radej; 2021a).

The meso-matrix approach assumes a two-phase aggregation synthesis: first, from the micro level of participants to the meso level of evaluation domains (groups, themes, sectors…), and then from the meso level to the meta-level of overlap between all evaluation domains.[4] The concept of secondness of thirdness can be illustrated using a Venn diagram. It depicts three (or more) partial overlaps between three (or more) circles, culminating in a meta-overlap between partial overlaps.

The purpose of authentic mesoscopic synthesis differs from both conventional aggregation methods (micro to macro) and inauthentic mesoscopic synthesis (micro to meso, meso to macro or micro-meso-macro synthesis, the latter exemplified by Dopfer et al.). The ultimate goal is not to reach the macro level, which only reifies reason-centered or result-based as incompatible logic, with the mesoscopic nature of the problem. In contrast, authentic mesoscopic synthesis reaches only the meta-level, as the highest level of synthesis that does not impose a trade-off between inclusivity and collective rationality — because the middle level remains empty of exclusionary logocentric evaluations. While the macro-level is full of itself, the meta-level is empty of essence. ʻBreak a legʼ in the theatre means ʻgood luckʼ, and should be understood at a meta-level beyond the direct meaning of words (macro). The meta-level induces inversion of meaning, in the intersection of literal sense with the tone, context, or cultural factors, when different readings are expressed through the void, sometimes only as irony, satire, or sarcasm.

Authentic middle is of course empty. Standing in the empty middle allows an evaluator and participants to become blindsighted. They recognise that they are biased, and blinded, and stand in the void about what is best for all. The blindsighted evaluator does not choose between binary options, without being a relativist. A blindsighted evaluator assesses complex issues in categories of the middle ground, free from preconceived notions about which values, knowledge, or preferences should be prioritized. A blindsighted evaluator explains indeterminate things with generative paradoxes that can be meaningful and policy-relevant only when approached from the empty middle while remaining enigmatic, preposterous, or nonsensical to all others.

[1] Reorganised in the Regulatory Scrutiny Board in 2015. Accessed May 2024.

[2] There are as many ways of constructivist synthesis as there are those who observe uncertain things, because everyone sees them differently. For example, Mann collected 1000 different constructs of the concept of sustainable development designing 1000 different approaches to integrating its integral domains. Mann S. 2011. Sustainable Lens: A visual guide. NewSplash Studio, Dunedin. 206 p.

[3] Dick B. 2000. Delphi face to face. Accessed May 2024.

[4] The aggregation from micro to meso level is not different from the conventional approaches to aggregation only that it is accomplished differently for different domains.

Chapter I

Chapter II

Chapter III.1

Collective Rationality in Evaluation

Written by Bojan Radej