Expert Consensus on Theory Development Methodology

Version 1.0.2

General description

Title: Expert Consensus on an Open-Textbook for Theory Development Methodology
Short title: Expert Consensus on Theory Development
Acronym: ECTO

Summary: This project aims to develop a collaborative textbook and curriculum on theory creation, development, and evaluation in psychological science through expert consensus.
PI: N.N.N. van Dongen (Erasmus University Rotterdam)
Advisors: B. Aczel, B. Haig, O. Perski, B. Jorg
Version note: This Project Description is not final. It will be updated after we finalize: (a) the expert identification procedure and inclusion/exclusion criteria, (b) the literature search strategy for building Version 0.0, and (c) the detailed design of the expert consensus (Delphi) rounds. Later updates might also occur as the project progresses

1. Target Output

1.1 Aim

To develop a Delphi-based, expert-consensus textbook and curriculum that specifies:

1) what theorists in psychology should do (tasks/outputs),
2) the skills required to perform those tasks (learning goals),
3) the subjects/topics needed to learn those skills (chapter structure),
4) the content and learning objectives per chapter,
5) assignments that train the relevant skills, and
6) evaluation methods and rubrics for assessing student performance.

1.2 Primary Deliverables

Textbook (Version 1.0): “Methodology for Theory Creation, Development, and Evaluation in Psychological Science” (open, citable).
Instructor’s Guide: assignments bank, rubrics, grading/evaluation matrix.
Open materials: protocols, datasets of anonymized Delphi responses, project documents, and revision histories (GitHub).
Scholarly outputs: journal publication describing the consensus procedure and its results.

1.3 Scope

The textbook focuses on pre-empirical theory work (idea generation, specification/formalization, explanation, prediction, model–theory relations) and interfaces with empirical testing (deriving hypotheses, evaluating explanatory power) without duplicating general research-methods content.

2.1 Expert identification

Experts will be identified and invited according to the Expert Identification Protocol.

2.2 Eligibility Screen (Round-0)

Invitees will be asked to rate agreement (7-point Likert) with the statement:

“The role of the psychological theorist can be systematically trained and evaluated via explicit procedures, skills, and curricula.”

Include if rating ≥ 3 (provisionally; final threshold may be adjusted in v1.0).
Exclude if rating ≤ 2 or unwilling to participate in multi-round Delphi.

2.3 Inclusion Criteria

Demonstrated contributions to theory development, formal modeling, philosophy of science, methodology, or pedagogy in psychology or neighboring fields (determined by the Expert Identification Protocol)
Commitment to at least 12 iterative rounds (estimated 20 - 40 minutes per round).
No conflicts of interest (e.g., direct commercial stake in competing proprietary curricula).
No prior substantiated ethical violations relevant to research/teaching integrity.

2.4 Qualification assessment

When invited, potential experts will be asked to rate their expertise on:

Substantive theory development
Methodology as it pertains to theory development and assessment
Teaching and teaching-related pedagogy

This self-assessment will be conducted via a short survey with Likert-scale questions. Results from this survey will not be used as inclusion/exclusion criteria, because there may be systemic differences between how different types of people evaluate themselves on the same level of skills.

Digital informed consent will be obtained before Round 1 (see Informed Consent Form, LINK TBA))
Anonymity of responses within rounds; names may be acknowledged in outputs only with explicit permission.
Data are stored on secure university servers and on a GitHub repository, with anonymized public release of aggregate summaries.
GDPR-compliant data handling; right to withdraw at any time without penalty.

3. Panel size and Retention

3.1 Panel Size

Target: N ≈ 10 experts

minimum viable N = 6
Maximum sustainable N = 15

We aim for a balance across: discipline (psychology, philosophy of science, computational modeling, education), methodology (formal & conceptual), geographic region, and gender.

3.2 Replacement and Retention

Over-include by at least ~30% of the minimum to offset attrition.
If, after a phase (see Section 5), the number of experts has dropped below 10, we will recruit additional experts until we have reached >10 experts.
If the number of experts drops below 6, during a phase (or at any point), we will recruit additional experts using the same criteria.

4. Literature search and Input material for the Delphi phases

A systematic literature search will be conducted to assemble a representative, interdisciplinary corpus of theoretical, methodological, and pedagogical works relevant to the creation and evaluation of scientific theories in psychology. This corpus will serve as the evidence base and initial content for developing the textbook and generating Delphi survey materials.
Target: ~50 sources are expected to be included.

Material for the Delphi procedure will be developed in phases (see Section 4 below). The PI will prepare a V0.0 version of this material based on the sources identified through the literature search. An Evidence Map will link the sources to this material. See Search Strategy for further details.

5. Study Design: Delphi Phases and Instruments

We align six Delphi phases to the project’s core questions. Each phase proceeds in iterative rounds (survey + synthesis feedback), then feeds the next phase.

Phase 1. Tasks and Responsibilities of a Theorist

Goal: Achieve consensus on what theorists do/create; what can/should be delegated.
Round-1 instrument: Initial item list and open elicitation (Likert 1–7 on importance/centrality; free-text suggestions).
Output: Consensus list of central tasks/responsibilities of the theorists

Phase 2. Skills & Learning Goals

Goal: Map tasks to skills/competencies; identify which are trainable in an academic setting and for which training already exists.
Instrument: For each task, rate required skills (importance, teachability, prerequisite level); indicate existing training domains (math, programming, data science, philosophy, etc.).
Output: Competency matrix (task × skill) → global learning outcomes.

Phase 3. Subjects/Topics (Chapter Blueprint)

Goal: Translate skills into subjects/topics and a draft Table of Contents (ToC).
Instrument: Rate topic relevance, dependency structure, and placement; propose adds/removes; rank ordering.
Output: Textbook ToC v0.0 (sections, chapters, learning objectives per chapter).

Phase 4. Chapter Content & Learning Objectives

Goal: Specify per-chapter content elements and readings; link to learning outcomes.
Instrument: For each chapter: essential concepts, examples/case studies, formalization tasks, key readings; rate coverage sufficiency and clarity.
Output: Chapter outlines v0.0 (LOs, sections, worked examples, references).

Phase 5. Assignments & Practice

Goal: Curate assignments that train targeted skills (conceptual, formal, computational, reflective).
Instrument: Rate assignment–skill alignment, difficulty, time on task, feasibility; propose rubrics.
Output: Assignment bank mapped to chapters and skills.

Phase 6. Evaluation & Rubrics

Goal: Define performance criteria and rubrics for theory education (conceptual rigor, formal precision, explanatory coherence, collaboration).
Instrument: Rate rubric criteria (validity, reliability, fairness); propose evidence of mastery; align with program outcomes.
Output: Evaluation matrix & sample rubrics (instructor guide).

6. Adding and Removing items

During each consensus round, Experts can propose to remove or add items. The PI will evaluate the argumentation for removal/addition and suggest a decision for the next consensus round. If a 2/3 majority agrees with the decision, the item is removed from / added to the material.

7. Consensus Criteria and Stopping Rules

For each phase, items will be rated on 7-point Likert scales (1 = strongly disagree / not relevant; 7 = strongly agree / highly relevant). We stop a phase when:

1) Consensus: Phase item-set achieves mean ≥ 4.5, no single item receives a score < 3.0 by more than 30% of the experts, and no new items are proposed in the final feedback round; or
2) Rounds cap: A maximum of 5 rounds reached per phase. If consensus is incomplete, the phase output will be published with an addendum noting the lack of full consensus and item-level agreement statistics.

During each round, items that receive no score below 3 by more than 25% of experts and an average of 4.5 or higher will be designated as “consensus reached” and will be dropped from the next consensus round. Thus, once consensus has been reached on an item, it will not be reevaluated during the following rounds.

A mid-project evaluation will be triggered when Phase 3 has been completed, and more than 12 feedback rounds have been conducted. This evaluation will be used to decide on how to proceed. The project can proceed as follows:

The last three phases are conducted as planned.
The project is terminated, and only the results up to and including Phase 3 are published.
An alternative form for completing phases 4, 5, and 6 is found.

An online meeting will be planned with the experts to discuss these options. Option 1 is the default, and a two-thirds majority is needed to initiate options 2 or 3.

8. Analysis and Synthesis Plan

Quantitative: For each item—mean, median, IQR, % agreement (≥4), change across rounds; visualize movement toward consensus.
Qualitative: Items will be revised based on comments made by the experts. If required, thematic synthesis will be used for systematic assessment and application of these comments.
Versioning: Every phase produces a versioned artifact (ToC v0.0 → v0.1 …; Chapter outlines v0.0 → v0.1 …) with changelogs.

9. Post-Consensus Amendments

We allow controlled amendments after consensus if:

1) Peer review of outputs recommends changes; or
2) New evidence (e.g., published pedagogy/methods research) warrants updates.

Procedure: propose amendment → rapid mini-Delphi (≤3 rounds) among the existing panel on affected items only. If the criteria in §5 are re-met, the artifact remains “consensus-based.” If not, publish with an addendum describing dissent and statistics.

10. Materials, Data, and Transparency

Project description and logbook: GitHub and Zenodo, versioned (this v0.0 → v1.0 at launch of Round-1).
Data storage: All materials produced during this project will be stored on EUR SURF Yoda, as is standard in the Data Management Plan upheld by the Erasmus University Rotterdam. No data will be destroyed upon completion of this project. It will be stored for a minimum of 10 years.
Instruments & codebooks: Public on GitHub and Zenodo; pilot drafts may be private until finalized.
Data sharing: Public on GitHub. Anonymized item-level ratings and aggregated phase summaries; free-text comments shared in de-identified, paraphrased, or redacted form where needed to protect identities.
Reproducibility: Analysis code and rendering scripts shared under a permissive license.
Use of LLMs and other AI tools: LLMs will be used for creating first drafts of material and consensus reports. All LLM output will be checked by human experts and rewritten to suit the needs of this project. No sensitive or proprietary information will be unlawfully used as input.

11. Ethical Considerations

Review body: Ethics Review Committee Psychology, Erasmus School of Social and Behavioural Sciences (submission planned alongside v1.0).
Risk level: Minimal risk (expert opinion study).
Confidentiality: The expert responses are anonymized for the reports. The surveys per round are not anonymized because the PI needs to be able to ask for clarification from the experts.
Withdrawal: Participants may withdraw at any time; data will be removed upon request where feasible.
Compensation: None

12. Deviations from Plan

Any deviations (e.g., panel size changes, altered thresholds, additional rounds) will be documented, timestamped, and justified with updated version numbers.

13. Authorship, Credit, and Conflicts

Manuscript authorship follows standard scholarly criteria (substantial contributions to design/analysis/writing). Consensus panel members will be coauthors on all scholarly output, unless they opt out.
Conflicts of interest: All investigators and experts will disclose relevant conflicts at enrollment; disclosures will be posted on GitHub.