Guidance: MCC Guidelines for Transparent, Reproducible, and Ethical Data and Documentation (TREDD) | March 2020

11. Glossary

beneficence: an ethical principle of research that incorporates two ideas: (i) do no harm and (ii) maximize possible benefits.

computational reproducibility: the practice of running the same code over the same data and obtaining the same results as those presented in the originally reported analysis.

contractor: any firm or individual hired by MCC or an MCA to conduct a data activity.

country partner:  as defined in Section 1.2, each country government partner receiving MCC assistance in the form of a compact or threshold program grant agreement.

data: individual, household, community, contextual, and entity-level information that MCC and its country partners collect, produce and/or use to inform investment decisions, operations, or monitoring and evaluation activities for MCC-funded assistance programs.

data activity: any action involving the designing, collecting, storing, analyzing, or sharing of data (e.g., the conduct of an independent evaluation is a data activity).

data confidentiality: Measures taken to maintain data confidentiality including, but not limited to, data encryption; maintaining an authorized access list and/ or requiring non-disclosure agreement(s); and knowing and possessing authorized rules for handling, storing, and transferring data with approved methods (e.g., encryption).

data de-identification: general term for any process of removing the association between a set of identifying data (direct and indirect identifiers) and the data provider. De-identification includes all techniques that allow access to data while simultaneously limiting the opportunity for unwanted disclosure.

data handler: any person (individual or legal entity) who collects, stores, analyzes, and/or shares data.

data integrity: the accuracy and consistency of the data, ensuring the data is unchanged, intact, and complete.  Data integrity is achieved by protecting data confidentiality, authenticity, and limiting modification to authorized users or events. 

data perturbation: methods used to alter data in order to mitigate risks to data provider (i.e. removal of PII/sensitive data; top/bottom coding of outliers)

data provider: any individual, household, community, or other entity who provides data.

direct identifiers: data that directly identify a person (individual or legal). This data may include full name, date of birth, mailing or home address, email address, telephone number, GPS coordinates, national identification number, and physical/biological identifiers (e.g., physical appearance, through photo or video data collection, fingerprints, DNA, etc.). Depending on the study and data needs, direct identifiers can also include the name of the school, health facility, community, etc. that directly identify the location of the data collection or extraction.

documentation: written materials that disclose the methods behind data activities, including but not limited to Design Reports, Baseline Reports, Interim/Final Reports, Evaluation Briefs, questionnaires, Transparency Statements, Statements of Difference/Support, peer review comments and responses.

Disclosure Review Board (DRB): the administrative body established by MCC in 2013 to (i) develop, review and approve guidelines and procedures (including modifications thereto) for data activities; (ii) review and approve proposals related to data disclosure; and (iii) notify the MCC Incident Response team in the event of an identified, specific disclosure risk (spill, breach, etc.) and follow MCC protocol for risk management.

Evaluation Design Report (EDR):  standard contract deliverable for independent evaluation contractors where the evaluation design is fully documented and approved by the relevant MCC Evaluation Management Committee

HHS:  the United States Department of Health and Human Services.

indirect identifier/quasi-identifier: data that can be used to identify a person (individual or legal) through association with another variable(s). These include unique, observable or other characteristics that may identify a specific data provider (or household, community, school, etc.) even when direct identifiers are removed.

informed consent: action required in research to operationalize respect for persons, where research subjects or data providers are informed of the objectives, duration, and description of the research, its expected benefits and risks, promises of confidentiality, how and who data will be shared with, and that their participation is voluntary.

Institutional Review Board (IRB): an administrative body established to assure that appropriate steps are taken to protect the rights and welfare of humans participating as subjects in research. To accomplish this purpose, IRBs use a group process to review, both in advance and periodically, research protocols and related materials (e.g., informed consent documents and investigator brochures) to ensure protection of the rights and welfare of human subjects of research.

justice: in research refers to the just distribution of the risks and burdens of the research and the benefits expected to be produced by the research.

linkage documentation: documents and other materials unrelated to the applicable data activity but that may support re-identification efforts or at least mitigate de-identification efforts.

MCC staff: as defined in Section 1.2, individuals employed by MCC.

personally identifiable information (PII): information that can be used, on its own or in conjunction with other information that is linked or linkable to a specific individual (or household, community, school, etc.), to determine the identity of a data provider or otherwise locate or contact the data provider. PII includes both direct and indirect (or quasi) identifiers.

p-hacking: known also as “data-mining” or “specification search” defines all the analytical alternatives that a research might test in order to obtain a statistically significant result. Examples include: restrict the sample, test subgroups or redefine variable after looking at the final data. 

primary data handlers: MCC M&E staff and Contractor Key Personnel

re-identification: any process that restores the association between a set of de-identified data and the data provider. 

reporting guidelines: a standardized procedure to report on study design, implementation, analysis, and interpretation of findings.

reproducibility/credibility crisis: general term to describe research findings that describe several problems across scientific fields. These include:  low rates of computational reproducibility, high prevalence of publication bias and p-hacking.

research protocol: a tool for documenting the planned research design and practices of a research activity, governing the activity’s implementation, and communicating its objectives and expected contributions.

researcher: an individual working for a contractor to lead one or more data activities.

respect for persons: an ethical principle of research that incorporates at least two ideas: (i) individuals are treated as autonomous agents and (ii) individuals with diminished autonomy are entitled to protection. In most cases, respect for persons requires that research subjects or data providers enter into the research voluntarily and with adequate information.

sample frame: the list from which units are drawn for a sample. The ‘list’ may be an actual listing of units, as in a phone book from which phone numbers will be sampled, or some other description of the population, such as a map from which areas will be sampled.

sample unit: the single value by which an aggregate sample is divided; each sample unit is regarded as individual and indivisible when the selection is made (for example: in an education evaluation, the sample units may be the (i) schools, (ii) teachers, (iii) households, and (iv) children). 

sensitive data: information that may pose a risk to the data provider if it is collected or released in a way that is linkable to the data provider (e.g., income, assets or health status).

study registration: A public, brief description of a study before data is available for analysis.

Transparency Statement: the contractor-authored document that documents the extent to which analysis in a published report can or cannot be reproduced with available data and documentation, and justifications for why reproduction cannot be facilitated. 

vulnerability: refers to a diminished ability to fully safeguard one’s own interest in the context of a specific research project. This may be caused by limited decision-making capacity or limited access to social goods, such as rights, opportunities, and power. Individuals or groups may experience vulnerability to different degrees and at different times, depending on their circumstances.