The Ottawa Surgical Competency Operating Room Evaluation (OSCORE) is an assessment tool that has gained prominence in postgraduate competency-based training programs. We undertook a systematic review and narrative synthesis to articulate the underlying validity argument in support of this tool. Although originally developed to assess readiness for independent performance of a procedure, contemporary implementation includes using the OSCORE for entrustment supervision decisions. We used systematic review methodology to search, identify, appraise and abstract relevant articles from 2005 to September 2020, across MEDLINE, EMBASE and Google Scholar databases. Nineteen original, English-language, quantitative or qualitative articles addressing the use of the OSCORE for health professionals' assessment were included. We organized and synthesized the validity evidence according to Kane's framework, articulating the validity argument and identifying evidence gaps. We demonstrate a reasonable validity argument for the OSCORE in surgical specialties, based on assessing surgical competence as readiness for independent performance for a given procedure, which relates to ad hoc, retrospective, entrustment supervision decisions. The scoring, generalization and extrapolation inferences are well-supported. However, there is a notable lack of implications evidence focused on the impact of the OSCORE on summative decision-making within surgical training programs. In non-surgical specialties, the interpretation/use argument for the OSCORE has not been clearly articulated. The OSCORE has been reduced to a single-item global rating scale, and there is limited validity evidence to support its use in workplace-based assessment. Widespread adoption of the OSCORE must be informed by concurrent data collection in more diverse settings and specialties.