P URPOSE
The Clinical Data Acquisition Standards Harmonization (CDASH) Model, along with the CDASH Implementation Guide (CDASHIG) and its Metadata Table, establishes essential standards for clinical trial data collection and the implementation of case report forms (CRFs) By standardizing data collection methods across studies and sponsors, CDASH ensures consistent formats and structures, enhancing the traceability of submission data into the Study Data Tabulation Model (SDTM) This transparency benefits regulators and data reviewers, while the CDASH standard also plays a crucial role in developing clinical data collection instruments.
• Consistency and detail in representations of research protocol concepts
• Streamlined processes within medical research
• Development of a corporate library of standardized CRFs
• Post-marketing studies/safety surveillance
Global recognition of industry standards is essential for effective data interchange between clinicians and researchers, facilitating better integration of medical research findings into clinical care By aligning clinical research with healthcare standards, CDISC promotes the adoption of global standards that enhance interoperability between healthcare and research systems This integration encourages more clinicians to engage in research, ultimately supporting informed healthcare decisions and improving patient safety.
This document serves as a guide for individuals involved in the planning, collection, management, and analysis of clinical trials and data, including clinical investigators, monitors, research associates, coordinators, data managers, programmers, biostatisticians, and drug safety monitors While primarily focused on the development of electronic case report forms (eCRFs), the CDASH standard is also applicable to various data sources Its principles and metadata can be utilized for eSource data, encompassing vendors' electronic data transfer standards, electronic patient-reported outcomes (ePRO) structures, and direct data acquisition from electronic health record (EHR) systems.
O RGANIZATION OF THIS D OCUMENT
General Notes
Throughout this document, a deliberate decision was made to use a variety of synonyms for various terms in order to reflect the fact that sponsors also use a variety of terms
• Paper CRFs vs electronic CRFs: The term CRF used throughout this document refers to both paper and electronic formats, unless otherwise specified
• Fields vs variables: Data collection fields refers to terms that are commonly on the CRF Data collection variables refers to what is in a clinical database
The term "study treatment" encompasses various forms of investigational products, including study drugs, vaccines, medical devices, and other test articles, thereby unifying the terminology across different study designs and products.
Various mechanisms for data collection, such as tick boxes, check boxes, radio buttons, and drop-down lists, can be utilized to manage the way data is gathered In this document, these terms are considered synonymous.
T HE T HREE C OMPONENTS OF THE CDASH S TANDARD
CDASH Model
The CDASH Model v1.1 offers a comprehensive framework for designing fields to gather data on Case Report Forms (CRFs) and includes essential model metadata that outlines standard variables Additionally, the CDASH Implementation Guide (CDASHIG) provides guidance on applying the CDASH Model and features the CDASHIG Metadata Table, which specifies further details for data collection variables across various domains.
The CDASH Model v1.1 establishes root naming conventions for CDASHIG variables to streamline mapping to SDTMIG variables, adhering to the same " XXXX" naming format as SDTM In this system, the two dashes are replaced by the domain code to form the corresponding CDASHIG variable; for instance, the variable name DOSFRQ represents Dosing Frequency per Interval in the Interventions Class, which becomes CMDOSFRQ when the domain abbreviation "CM" is applied for concomitant medication frequency Additionally, the CDASH Model encompasses metadata for variables across all SDTM general observation classes, including Timing variables, Identifier variables, Special Purpose domains, and domain-specific variables, with further details available in Section 3.5.1.
CDASHIG
The CDASH Implementation Guide (CDASHIG) offers essential guidance on applying CDASH standards, which encompass both the CDASH Model and the accompanying CDASHIG Metadata Table It is crucial to reference the informative content of the CDASHIG alongside the normative content found in the metadata table to ensure comprehensive understanding and adherence to the standards.
CDASHIG Metadata Table
The CDASHIG Metadata Table includes only those variables commonly implemented by a significant number of the organizations/companies that provided information/examples (e.g., Medical History, Adverse Events)
Implementers can enhance their CDASHIG domain by incorporating relevant variables from the General Observation class in the CDASH Model The CDASHIG Domain Metadata showcases the Question Text and Prompts commonly utilized by sponsors For a comprehensive list of available options regarding Question Text, Prompts, parameters, and interchangeable verb tenses, implementers should consult the CDASH Model.
CDASHIG M ETADATA T ABLE A TTRIBUTES
CRF and Data Management System Design Metadata
Key metadata attributes are crucial for achieving CDASH conformance When paired with the variable naming conventions outlined in Section 5.1, Conformance Rules, these attributes will guide the design of Case Report Forms (CRFs) and the corresponding database structure to ensure compliance with the standard.
• Question Text (full sentence/question forms to prompt for data) OR Prompts (short phrases, often suitable as column headers, to prompt for data)
• CDISC Controlled Terminology lists and subsets of list values when applicable
• CDASHIG Core designations and implementation notes (which, when used together, can assist a designer in determining the complete set of data to be collected on a form)
SDTMIG Programming Metadata
Columns in the CDASHIG Metadata Table that will assist in developing programs to generate SDTM domain datasets from CDASHIG-compliant data include:
• Subset Controlled Terminology/CDASH Codelist Name
Additional Metadata
Clear and consistent completion instructions for sites are essential for collecting high-quality, reliable data, which is crucial for developing quality pooled and submission data The CDASHIG Metadata Table features a column dedicated to Case Report Form Completion Instructions, aiding authors in creating effective study-level documentation to guide sites on how to accurately complete CRF fields.
CRF D EVELOPMENT O VERVIEW
The key steps to developing CRFs using CDASH are as follows:
1 Each organization may maintain a corporate library of standardized CRFs Determine the requirements for data domains from these (if applicable) or from the protocol data collection requirements for the study
2 Review the domains published in the CDASHIG to determine which of the data collection domains and fields are already specified in the published domains
To optimize data collection, prioritize the use of standard domains, ensuring the tools developed for this purpose are based on these established frameworks This approach enhances the effectiveness of the data collection process.
When developing CDASH-compliant collection instruments like CRFs and eCOA screens, it's crucial to determine the appropriate SDTMIG domain for data mapping This selection is independent of transmission mode, data generation methodology, storage medium, data recorder, or the subject described The chosen SDTMIG domain influences the CDASH variable names, question texts, prompts, and controlled terminology used While CDASH provides a suggested format for data entry, it does not prescribe a specific data structure for the storage of collected data, commonly known as a data management operational database.
A recent study utilized a subject-completed patient-reported outcome (PRO) to gather meal consumption diary data, while another study employed a photographic method where subjects captured images of their meals before and after eating, sending these to a third party for analysis Despite the differing data collection methods, both studies' meal consumption data align with the SDTMIG ML (Meal Data) domain standards.
In a study, blood samples from subjects are sent to a central laboratory for analysis, with results transmitted to the sponsor through electronic data transfer In a separate study, the samples are
CDASH recommends collecting dates in a clear and unambiguous format, specifically the DD-MON-YYYY format, for data entry However, this guideline does not specify the electronic storage format, allowing for various legitimate methods of storage For instance, one system may store dates as character fields, another may use numeric values like SAS dates, while yet another might separate the day, month, and year into three distinct fields.
Incorporate root variables and relevant CDASH metadata into the CDASH Model by adding any necessary variables to fulfill data collection requirements Adhere to the CDISC Variable Naming Fragment conventions and CDASH root variable naming standards, such as using DAT for dates, TIM for times, and YN for prompts For instance, replace " " with the corresponding 2-character domain code that aligns with other variables within the same domain; for example, to include the LOC variable in a Medical History CRF, use the appropriate domain code.
"MH", so the variable would become "MHLOC" in that domain
The CDASH Model metadata includes Question Text and Prompt columns that offer various formulations for questions on a Case Report Form (CRF) Sponsors can choose to utilize either the Question Text or the Prompt when designing the CRF The text may feature brackets [ ], parentheses ( ), or forward slashes, indicating possible modifications Specifically, text within brackets suggests options for verb tense or protocol-specific language, while parentheses present alternatives like singular/plural forms or optional text that can be omitted Additionally, forward slashes denote optional words available for the sponsor's selection.
Example: The CDASH variable PERF, from the CDASH Model, has the following Question Text and Prompt
[Were any/Was the] [ TEST/ topic] [measurement(s)/test(s)/examination(s)/specimen(s)/sample(s)] [performed/collected]?
[ TEST/Topic] [Measurement(s)/Test(s)/Examination(s)/Specimen(s)/Sample(s)]
The sponsor wants to add a question to a CRF that asks whether a lab specimen was collected using a Yes/No response
The sponsor selects the CDASH variable PERF and adds the appropriate domain code LBPERF Use either the Prompt or the full Question Text on the CRF
The laboratory specimen was collected as the study required only one lab test to be performed The focus of the study is on the laboratory, specifically the specimen involved in the testing process.
Laboratory specimens are collected for various measurements, tests, and examinations to ensure accurate analysis and results.
To establish custom domains aligned with the General Observation Classes in the CDASH Model, refer to Section 3.4 for guidance on creating new data collection fields when a CDASHIG field is not defined.
The CDASHIG Metadata Table attributes provide building blocks for the development of a CRF and the underlying database or other data collection structure
H OW CDASH AND SDTM W ORK T OGETHER
The Study Data Tabulation Model (SDTM) and its Implementation Guide (SDTMIG) establish a standardized framework for data submission, while CDASH, positioned earlier in the data flow, outlines essential data collection fields commonly found on case report forms (CRFs) The relationship between SDTM and CDASH is significant, as CDASH is designed to streamline the mapping of data to the SDTM structure When data can be collected directly in an SDTM-compatible format, the corresponding SDTMIG variable names listed in the CDASHIG Metadata Table should be utilized for data collection However, if the data requires transformation or lacks a corresponding SDTMIG variable, CDASH provides standardized variable names for effective data collection.
2 CDASHIG Version 2.1 content is based on SDTMIG Version 3.2
All necessary SDTMIG "Required" variables have been effectively addressed through direct data collection or by identifying the data needed to derive these variables In certain instances, values for SDTMIG variables can be sourced from alternatives to the Case Report Form (CRF) or may be generated during the creation of submission datasets, such as SEQ values.
CDASHIG domains include variables essential for creating the RELREC submission dataset, which is an SDTM dataset that outlines relationships between records for a subject both within and across domains, as well as across datasets Each dataset collects specific identifiers to accurately identify these relationships, facilitating data merging This collected information may involve records from the same domain, different domains, or sponsor-defined variables For instance, the CDASHIG variable CMAENO, which asks for the identifier of adverse events linked to a specific medication or therapy, can be utilized to establish connections between records in the CM dataset and those in the AE dataset.
The CDASH standard introduces specific data collection fields absent from the SDTMIG, such as inquiries about adverse events and medication usage These fields facilitate user-friendly data collection and enhance data cleaning and monitoring by verifying that certain fields on the Case Report Form (CRF) are intentionally left blank In electronic data capture (EDC) systems, corresponding CDASH variable names are listed in the CDASHIG Metadata Table, ensuring clarity in data collection expectations.
In the context of CMYN, the SDTMIG Variable Name is marked as “N/A,” and the Mapping Instruction specifies that this field is not submitted When the CDASHIG field indicates that data collection is not anticipated in another date field, such as CMPRIOR or CMONGO, the SDTMIG Variable Name identifies the relevant SDTM timing variables along with their corresponding Mapping Instructions.
The CDASHIG Findings domain tables, including Drug Accountability (DA), ECG Test Results (EG), and Vital Signs (VS), are structured similarly to the SDTMIG, listing variable names alongside test examples Implementers must incorporate protocol-specific tests in the Case Report Form (CRF) layout, utilizing the appropriate values from CDISC Controlled Terminology codelists For instance, VSTEST values are used to identify tests on the CRF, with corresponding test codes derived from the VSTESTCD codelist Additionally, synonyms may be employed for lengthy or less recognized xxTEST values, such as using ALT instead of Alanine Aminotransferase.
Implementers should utilize the CDASHIG recommendations to determine the necessary data types for collection, while also consulting the SDTMIG and CDISC Controlled Terminology for supplementary metadata, including labels, data types, and controlled terminology.
The CDASH standard does not replicate sections of the SDTM standard; implementers should consult the SDTM and SDTMIG for further details, available on the CDISC website (http://www.cdisc.org/sdtm) This approach ensures the capture of therapeutic area-specific data points and complies with clinical study protocols and local regulatory requirements For more information, refer to the CDASH Model and the relevant CDISC therapeutic area user guides.
Utilize CDASH recommendations to establish company standards that align with the clinical development stage and specific therapeutic area needs To maximize the advantages of the CDASH standard, it is essential to avoid creating case report forms (CRFs) on a trial-by-trial basis Instead, integrate CRFs from a library of approved templates based on the CDASH Model and Implementation Guide whenever possible.
The CDASHIG is organized into sections based on similar data types, with the CDASHIG Metadata Table arranged alphabetically by domain abbreviation within each general observation class Although CRF layout was not part of the original CDASH project scope, the Metadata Table presents data collection fields in a logical order to promote standardization, along with annotated example CRFs when available Implementers are encouraged to refer to Section 4.1, which outlines best practices for ordering fields on a CRF.
C ORE D ESIGNATIONS FOR B ASIC D ATA C OLLECTION F IELDS
The CDASH Team initially aimed to use SDTMIG Core Designations—Required, Expected, and Permissible—to ensure consistent categorization across CDASH and SDTM standards However, during the construction of the CDASHIG Metadata Table, it became evident that CDASHIG designations often differ from SDTMIG designations due to variations in data collection and reporting structures For instance, a variable marked as Required in SDTMIG may not be necessary in CDASHIG if it can be derived from SDTM datasets rather than being explicitly captured on a CRF Additionally, while the SDTMIG designation of "Required" enforces a rule against null values, CDASHIG designations do not impose such rules but simply indicate which fields should be included on the CRF.
In order to facilitate classification of the different types of data collection fields, the following categories were used:
• Highly Recommended (HR): A data collection field that should always be on the CRF (e.g., the data are needed to meet a regulatory requirement or are required to create a meaningful dataset)
Recommended/Conditional (R/C) fields are essential data collection elements that should be included on a Case Report Form (CRF) depending on specific conditions For instance, while capturing complete dates of birth is generally preferred, it may be restricted in certain regions Additionally, the timing of adverse events (AEs) should only be documented if there is another data point available for comparison.
"condition" is described in the Implementation Notes column of the CDASHIG Metadata Table
• Optional (O): A data collection field that is available for use.
F ORM - LEVEL CRF I NSTRUCTIONS
General Design Considerations for Completion Instructions
For optimal clarity and organization in Case Report Forms (CRFs), details related to the completion of a specific field should ideally be placed alongside that field If the medium or system used for creating CRFs does not allow this, it is acceptable to include field-level instructions at the top of the form within the form-level instruction area In situations where these instructions are extensive or incorporate graphics or flowcharts, a separate guideline for CRF completion may be necessary.
General Content Considerations for Completion Instructions
When creating form-level instructions for a CRF, the following points should be considered:
• The instructions should provide references to protocol sections for the specifics of and/or limitations on the data to be reported
• The instructions should include any special instructions for additional reporting or actions required beyond what is collected on the CRF
• The instructions should include considerations on how data collected on one CRF might have an impact on data that are reported on a different CRF
• The instructions should refer to any other forms that are related to the CRF being completed.
H OW TO C REATE N EW D ATA C OLLECTION F IELDS W HEN N O CDASHIG F IELD H AS B EEN D EFINED
CDASHIG Field Has Been Defined
Introducing new sponsor-defined collection fields can be limited by business regulations, clinical data standards experts, and electronic data capture systems The CDASHIG guidelines for naming conventions and variable creation are intended to ensure data collection is effective, irrespective of its later integration into SDTM, while also streamlining the process of converting collected data into submission datasets.
Before incorporating new fields into a sponsor's study Case Report Form (CRF), it is essential to review the CDASH Model to determine if an existing root field can effectively address the data collection requirements.
New data collection fields (not already defined in the CDASH Model) will fall under one of following categories
For data cleaning purposes, fields such as YN, which prompts the question "Were there any [interventions/events/findings]?", can be incorporated into a domain but should not be included in SDTM datasets To implement this, replace the two dashes ( ) with the corresponding two-character domain code and utilize generic question text from the CDASH Model as a foundation for the prompt It is essential to maintain consistent naming conventions when creating custom data cleaning or operational variables.
When collecting data for operational databases, it is essential to use the SDTMIG variable names for fields that directly correspond to these variables, ensuring that the values, data types, meanings, and controlled terminology remain consistent This practice streamlines the mapping process and enhances data integrity If necessary, unique extensions can be added to the variable names in the collection database, but the meaning of any collection variable that aligns with an SDTMIG variable must remain unchanged.
In studies where specific fields do not have a direct one-to-one correspondence with SDTM datasets, it is essential to use distinct variable names in the operational database This applies when the data type collected differs from that of the corresponding SDTMIG variable, or when the SDTMIG variable is derived from the collected data By implementing different naming conventions, clarity and accuracy in data mapping can be maintained.
A study gathers Findings data in a denormalized format before converting it to the normalized SDTM structure The TESTCD values serve as the CDASHIG variable names, while the corresponding TEST values function as prompts on the CRF For additional details, refer to Section 8.3.1, which covers General CDASH Assumptions for Findings Domains.
Dates and times are gathered in a user-friendly local format for CRF users and subsequently converted to the ISO 8601 format specified by SDTM In the operational database, the CDASH variables DAT and TIM, when collected, are combined into the single SDTM variable DTC.
When mapping data to SDTM, if the relationship is similar but not direct, the prefix "C" can be added to the variable name to signify a "collected" version For instance, if injections are administered to various body parts like the LEFT THIGH or RIGHT ARM, the sponsor might create a variable called EXCLOC Additionally, if an STDM variable is not defined in the sponsor's version of SDTM, it can be included as a non-standard variable or supplemental qualifier In cases where a study requires a field that lacks a corresponding definition in both CDASH and SDTM, a unique name should be created following the sponsor's business rules, utilizing CDASH naming fragments and CDISC Variable Naming Fragments when applicable, as outlined in SDTMIG v3.2 Appendix D.
E XPLANATION OF T ABLE H EADERS IN THE CDASH M ODEL AND CDASHIG M ETADATA T ABLE
CDASH Model
This section provides an explanation of the columns used in the CDASH Model
• Observation Class: This column contains the SDTM Class for the domain
• Domain: This column contains the 2-letter domain code
The Order Number column is utilized to sequence variables in the metadata table, ensuring they appear in a specific order It is important to note that these values do not carry any implied meaning, significance, or conformance expectations Each variable within a unique grouping of Observation Class and Domain is assigned an increasing value, incrementing by one.
• CDASH Variable: This column provides the CDASH root variable names (e.g., ONGO, DAT)
• CDASH Variable Label: This column contains a suggested root variable label that that may be used for the CDASHIG variable
The DRAFT CDASH Definition column presents a preliminary definition of the root variable, which may differ from the text in the SDTM A new CDASH/SDTM team is actively working on creating standardized variable definitions Once these definitions are finalized, the CDASH definitions will be revised to ensure consistency and alignment.
The CDASH Model includes a column for Question Text, which provides the recommended phrasing for data collection fields Each Question Text is presented as a complete sentence Any text within brackets [ ] should be substituted with the appropriate protocol-specific language, while text in parentheses () is optional Additionally, wording options separated by a forward slash allow sponsors to select their preferred phrasing.
The CDASH Model's prompt column includes suggested text for data collection fields, providing a concise version of the question Any text within brackets [ ] should be replaced with specific wording as per the protocol, while the text in parentheses () is optional Additionally, wording options separated by a forward slash allow sponsors to choose from multiple alternatives.
• Data Type: This column contains the simple data type of the CDASH variable (i.e., Char, Num, Date,
The SDTM Target column indicates the recommended mapping to the SDTM root variable If a direct mapping to an SDTM root variable is not applicable, the column will display "N/A."
"SUPP .QNAM", it means that the value represented in the CDASH variable shall be mapped to an SDTM Supplemental Qualifier
Note: CDASH variables noted as not having a direct map to SDTM variables (i.e., non-standard variables) may have SDTM variable equivalents in future versions
The mapping instructions provide guidance on how to align the root variable with the SDTM variable; however, in CDASH, these variables are typically not gathered using the ISO CT Instead, they are transformed into the ISO format during the creation of SDTM-based submission datasets.
This section offers detailed guidance on the rationale behind the implementation of CRF data collection fields, along with instructions for effectively mapping CDASH variables to SDTM variables.
Note: When multiple options are contained in a single cell, the options are separated by a semicolon.
CDASHIG Metadata Table
This section provides an explanation of the columns used in the CDASHIG Metadata Table
• Observation Class: This column contains the SDTM Class for the domain
• Domain: This column contains the 2-letter domain code
The Data Collection Scenario column in the CDASHIG Metadata Table specifies various data collection options for the same domain, serving as an effective filter for the table This column contextualizes the CDASHIG Core Designations by indicating which fields are required on the Case Report Form (CRF) If there is only one data collection scenario available for the domain, the column will display "N/A."
• Implementation Options: When this column contains "Horizontal-Generic", a sampling of the CDASHIG metadata is provided as a template for the metadata of the CRF in a denormalized structure
The Order Number column is essential for sequencing variables in the metadata table, suggesting the order in which CDASHIG variables should appear on a Case Report Form (CRF) It is important to note that these values do not carry any implied meaning, significance, or conformance expectations Each value increments by one for every variable within a specific grouping defined by Observation Class, Domain, and Implementation Options.
• CDASHIG Variable: This column provides the CDASHIG variable names (e.g., CMONGO, AEDAT)
• CDASHIG Variable Label: This column provides the CDASHIG variable label
The DRAFT CDASHIG Definition column presents a preliminary definition of the CDASHIG variable, which may differ from the text found in the SDTMIG A new CDASH/SDTM team is currently working on creating standardized variable definitions Once these definitions are finalized, the CDASH definitions will be revised to ensure consistency and alignment with them.
The Question Text column in the CDASHIG Metadata Table offers suggested wording for specific domains, guiding implementers to refer to the CDASH Model for creating alternative question text that adheres to CDASH conformance rules Each Question Text is a complete sentence, featuring text in brackets [ ] that should be substituted with protocol-specific language, while text in parentheses () is optional Additionally, wording separated by a forward slash indicates choices available for sponsors.
The CDASHIG Metadata Table's prompt column offers suggested text for each domain, guiding implementers to develop alternative prompt text in line with CDASH conformance rules This prompt serves as a concise version of the question, with certain text enclosed in brackets [ ] or parentheses () Any text within brackets should be substituted with protocol-specific language A [NULL] entry signifies that including a prompt on a CRF screen/page is optional if not necessary.
• Data Type: This column contains the simple data type of the CDASH variable (i.e., Char, Num, Date,
• CDASHIG Core: This column contains the CDASHIG core designations for basic data collection fields
(i.e., Highly Recommended (HR), Recommended/Conditional (R/C), Optional (O)) See Section 3.2, Core Designations for Basic Data Collection Fields
The SDTMIG Target column suggests the mapping to the SDTMIG variable names, aiding in the creation of necessary SDTMIG variables for submission If there is no direct mapping available, it is indicated as "N/A." Additionally, when the column shows "SUPP .QNAM," it signifies that the value in the CDASH field should be mapped to an SDTM Supplemental Qualifier.
Note: CDASHIG variables noted as not having a direct map to SDTMIG variables (i.e., non-standard variables) may have SDTM variable equivalents in future versions
The Mapping Instructions column outlines the recommended mapping of CDASHIG variables to SDTMIG variables, offering more detailed guidance than the CDASH Model In the absence of domain-level metadata, refer to the CDASH Model for SDTM Mapping Instructions.
The Controlled Terminology Codelist Name column identifies the associated Controlled Terminology (CT) codelist, such as "LOC." While the SDTMIG specifies that certain variables, including dates, should follow ISO formats as CT, CDASH typically does not collect these variables in ISO CT Instead, they are converted to the ISO format during the creation of SDTM-based submission datasets.
• Subset Controlled Terminology/CDASH Codelist Name: This column contains the CDISC Controlled
Terminology or CDASH Subset Codelist name that may be used for that specific variable (e.g.,
The Implementation Notes section offers essential guidance on the rationale and instructions for effectively implementing CRF data collection fields, as well as mapping CDASHIG variables to SDTMIG variables.
Note: When multiple options are contained in a single cell, the options are separated by a semicolon.
C OLLECTION , C ONVERSION , AND I MPUTATION OF D ATES
Collection of Dates
When collecting dates, it is essential to ensure that sites only record the precision they are certain of The system must store only the precision that has been collected, without making any assumptions Incomplete dates should remain as such, without any imputation or the addition of "zero-filling" for missing components.
Data collection processes must accommodate partial dates and times, as these may represent the most accurate information available in certain cases For instance, in Section 7.3, DM - Demographics, it's highlighted that privacy regulations in some countries limit the collection of full birth dates, allowing only the year or year and month Additionally, in the CM and MH domains, individuals may not recall the exact date they began a medication or when a significant medical condition arose, making partial dates essential for accurate data collection.
When collecting date information, if a complete date is available, the CDASH variable DAT or its three components ( DATYY, DATMO, DATDD) should be utilized in the collection tool For partial dates that can be entered in a single field, the CDASH DAT variable is appropriate However, if a partial date needs to be recorded in separate fields for year, month, and day, it is essential to follow the CDASH Model for standard naming conventions, such as YY, MO, DD, and TIM The specific variable names required will depend on the capabilities of the software systems in use, like EDC CDASH mandates distinct data fields for dates and times, and when times are recorded, they should be paired with the corresponding date to generate the related SDTM date variable in ISO8601 format.
Conversion of Dates for Submission
The SDTM date format allows this partial date to be submitted so the reviewer can see what was collected.
Imputation of Dates
When conducting analysis, any missing components of the date may be imputed, resulting in the creation of imputed dates within the Analysis Data Model (ADaM); however, these imputed dates will not be included in the SDTM submission datasets.
M APPING R ELATIVE T IMES FROM C OLLECTION TO S UBMISSIONS
Fixed Point in Time/Milestone
1 Define the fixed point in time or milestone (B or C) The fixed point in time or milestone can be a date or a description This will map into the SDTMIG variables STTPT or ENTPT when the SDTM-based datasets are created
2 Collected comparisons (D or E) using CDASHIG variables (e.g., “prior”, “ongoing”) of when something started or ended in relation to the fixed point in time or milestone (B or C) These CDASH variables are used to populate the SDTMIG variables STRTPT or ENRTPT when the SDTM-based datasets are created
Note: These relative timing variables are only populated in the SDTM -based datasets when a date is not collected
For information about mapping what is collected in “prior”, “ongoing”, and “continuing” fields into the appropriate SDTMIG variables, see SDTMIG v3.2 Section 4.1.4.7.
CDISC C ONTROLLED T ERMINOLOGY
Global regulators mandate submission data standards, which include the use of controlled terminology (CT) Implementing CT from the outset of data collection enhances traceability and transparency while minimizing issues related to converting legacy codelists and variables to meet submission standards During data collection, CT can be effectively utilized in various ways.
1 To collect data using a standardized list of values (e.g., Mild, Moderate, Severe)
2 To ask a specific question on the CRF (e.g., Temperature)
3 To create a variable name in the database (e.g., TEMP for the collection of vital sign data when a unique variable name must be created for each vital sign result)
Terminology related to CDASH data collection fields is currently either in production or being developed by the CDISC Terminology Team The production terminology is published by the National Cancer Institute's Enterprise Vocabulary Services (NCI EVS) and is accessible at [NCI EVS](http://www.cancer.gov/cancertopics/terminologyresources/CDISC).
When utilizing CDISC Controlled Terminology, it's important to note that certain values may still be in development or serve as provisional "best-guess" placeholders at the time of publication Therefore, it is crucial to rely solely on the CDISC value set found in the NCI Thesaurus for accurate Controlled Terminology.
In some cases it is more appropriate to use a subset of a published SDTM terminology list, rather than the entire list
To access a defined subset of the SDTM terminology, visit [NCI's CDASH terminology resource](https://www.cancer.gov/research/resources/terminology/cdisc) The CDASH codelists are derived from the comprehensive SDTM terminology lists, providing a streamlined method for implementers to establish codelists for data collection It is essential for implementers to also conduct a thorough review of these resources.
Some codelists, like Laboratory Test Codes (LBTESTCD), are designed to be extensible, allowing the addition of new values not currently listed, such as CDISC Submission Values, synonyms, or NCI preferred terms In contrast, codelists such as AE Action Taken with Study Treatment are non-extensible and must be utilized as they are When CDISC Controlled Terminology is unavailable, it is essential for implementers to create sponsor-defined terminology to maintain consistency and transparency Additionally, sponsors are encouraged to submit requests for adding values to controlled terminology through the Term Suggestion form found at http://ncitermform.nci.nih.gov/ncitermform/?versionsc.
In cases where a CDASH/CDASHIG variable has associated CT, the codelist is referenced in the Controlled
Terminology column in the CDASH Model and CDASHIG Metadata Table in this format: ( codelist name )
CDASH best practices offer essential operational recommendations for effective data collection and the development of case report forms (CRFs), emphasizing the importance of adhering to Section 4.1 for creating data collection instruments These best practices streamline the clinical data flow from CDASH to the Study Data Tabulation Model (SDTM) by outlining key data collection methodologies For further insights, refer to the Society for Clinical Data Management's Good Clinical Data Management Practices (GCDMP) standard available at https://scdm.org/gcdmp/.
B EST P RACTICES FOR C REATING D ATA C OLLECTION I NSTRUMENTS
Num Best Practice Recommendation Rationale
1 When a binary response is expected, "Yes/No" responses are preferred over "Check all that apply", because a missing response could lead to a misinterpretation of critical data
For example, if AEs are determined to be serious based only upon checking the applicable serious criteria (e.g.,
Hospitalization, Congenital Anomaly), failure to check a criterion would potentially delay identification of an SAE
If an assessment has composite responses (e.g., presence or absence of 2 or more symptoms), "Yes/No" questions for each component response (e.g., symptom) are preferred to
"Check all that apply" questions
In certain assessments, such as those collecting ECG abnormality data, it may be appropriate to consider options where most responses are "No." For instance, out of approximately 45 listed ECG abnormalities, only a limited number are likely to be relevant for a given patient.
Another exception is when a validated instrument contains checkboxes In this case, they should remain checkboxes in the CRF or eCRF
An important exception to the recommendation occurs when controlled terminologies dictate the values being collected For instance, when gathering RACE data using the "Check all that apply" option, it is essential to collect RACE values defined by controlled terminology as individual check boxes rather than as a simple "Yes/No" response.
In cases where the sponsor chooses to use "Check all that apply", additional quality checks should be considered (e.g.,
SDV) to ensure the data collected in the CRF are correct and complete
"Yes/No" questions provide a definite answer The absence of a response is ambiguous as it can mean "No", "None", or that the response is missing
In situations where there is no other dependent or related field by which to gauge the completeness of the field in question, a
A "Yes/No" response is crucial for ensuring data completeness For instance, if the AE End Date is missing, answering "Yes" to the question "Is the AE ongoing?" confirms that the data is complete Conversely, if the end date is present, there is no need to respond with "No" to the same question.
2 The database should contain an indication that a planned exam/assessment was not performed The mechanism for this may be different from system to system or from paper to EDC
For example, the data collection instrument/CRF could contain a field that allows the site to record an indication that a Vital Sign assessment was not performed (e.g.,
VSPERF="N" or TEMP_VSSTAT="NOT DONE")
A "Yes/No – assessment completed" question is favored over a "Check if not done" box, except when the latter can be validated against a completed data field to ensure that at least one option contains data.
In cases where there are no related fields to assess the completeness of the field in question, it is essential to utilize a "Yes/No" response format to ensure clarity and eliminate any ambiguity.
• When another related field is present, the
"Yes/No" response is optional For example, when a value for temperature is missing, a simple
"Not Done" box may be checked It is not
This will provide a definitive indicator that a data field has missing data and has not been overlooked
This will prevent unnecessary data queries to clarify whether an assessment has been performed
The use of the "Yes/No" format helps to eliminate ambiguity about whether an assessment has been completed value is present
3 Data cleaning prompts should be used to confirm that blank
Typically, this will be a "Yes/No" question (e.g., AEYN); however, it may also appear as a "Check if blank" box if a validation check can confirm that either the box is checked or that data is recorded in the CRF.
This will provide a definitive indicator that a CRF is blank on purpose and has not been overlooked
This will prevent unnecessary data queries
4 The same data (i.e., the same information at the same time) should not be collected more than once Collecting the same data more than once:
Collecting demographic information, such as a subject's birthdate or age, can lead to discrepancies if not managed properly For instance, while the birthdate is recorded on the Demographics page, it is not essential to gather age data on the Lab Case Report Form (CRF) at each visit.
• May affect frequency counts and analysis results
5 A "Check if ongoing" question is recommended to confirm ongoing against an end date
This is a special-use case of "Yes/No", where the data entry field may be presented as a single possible response of
When the End Date variable is set to "Yes" and the box is checked, the operational variable can be assigned a value of "Yes." Conversely, if the box remains unchecked but the End Date is filled in, the operational variable will be assigned a value of "No."
For certain EDC systems, utilizing radio buttons to present possible responses to the "Check if ongoing" question can enhance user experience By implementing conditional logic, the system can prompt for the collection of the end date only when the user indicates that the study is ongoing.
In the "Check if ongoing" use case, data is deemed "clean" when one of the two responses is provided while the other remains blank Specifically, the presence of an end date confirms that the event is not ongoing.
6 CRFs should use a consistent order of responses (e.g.,
"Yes/No") from question to question, for questions with response boxes or other standardized lists of values
Exceptions to this would be cases where a validated instrument (e.g., a standardized assessment questionnaire) is used
A consistent arrangement of response boxes enhances the usability of the Case Report Form (CRF), minimizing data entry errors and preventing bias that could influence the investigator's responses.
7 CRF questions and completion instructions should be unambiguous, and should not "lead" the site to answer the question in a particular way
To ensure the integrity of study data, it is crucial to collect information without introducing bias or errors This requires formulating clear and unambiguous questions, along with comprehensive answer options, including choices like "Other" and "None" when relevant.
8 CRF questions should be as self-explanatory as possible, thereby reducing the need for separate instructions
Short instructions can be included on the CRF page when the Prompt lacks specificity, while more comprehensive guidance can be found in the CRF completion guideline It is essential that all instructions remain concise for clarity.
Instructions should be standardized as much as possible
Incorporating concise instructions and prompts on the Case Report Form (CRF) enhances the likelihood of compliance, minimizes queries, and lowers overall data cleaning expenses.
Having standard instructions supports all sites using the same conventions for completing the fields
To enhance clarity and reduce the number of pages in the Case Report Form (CRF), it is beneficial to provide concise instructions and prompts while relocating detailed instructions to a separate booklet, facing page, or checklist This approach streamlines the CRF and improves usability.
• Decreased CDM costs (e.g., decreased data entry costs)
• Allows CRF to be formatted so that the reader can easily identify the fields to be completed
• The format of the page is less cluttered, which makes it easier for site personnel and monitors to identify fields with missing responses
CRF D ESIGN B EST P RACTICES
The following recommendations are general principles that may be implemented during CRF form design and/or database set-up in different ways, depending on the systems used
Ensuring a consistent and clinically logical order for data fields at the clinical site will streamline data entry, ultimately leading to more reliable results The Case Report Form (CRF) must be designed for efficiency, allowing site personnel to complete it quickly and easily.
Clinical Operations staff should review the CRF for compatibility with common site workflow and site procedures
To enhance user experience and streamline data collection, position fields that frequently appear across various forms at the top For instance, when collecting information such as the date and time, ensure that the collection date is placed first, followed by the time, on every relevant form.
When designing a clinical assessment form, it is essential to arrange the fields in the order they will be collected Additionally, incorporating fields from various domains is permissible as long as it aligns with the clinical workflow.
In a clinical study, related fields for a single encounter should be grouped together, even if multiple time points or visits are recorded on the same form For instance, when heart rate and temperature are measured hourly over four hours on study day 1, the form can capture data sequentially for each hour, including heart rate and temperature results along with their respective units Each time point within study day 1 will be clearly labeled to ensure clarity and organization of the data collected.
When presenting test results, it's essential to group related fields together, ensuring that results are displayed alongside their corresponding units For instance, the temperature result labeled "TEMP" should be immediately followed by its acceptable units, such as "F" and "C." This practice enhances clarity and improves the overall readability of the data.
In some cases, the result might have only 1 applicable unit For example, the only applicable unit for "PULSE" is
"beats/min" The unit should be displayed on the CRF and databased
5 Data fields that are dependent on other data fields should be placed in the CRF in such a way that this dependence is obvious
In an electronic data capture (EDC) system, when a questionnaire includes an "Other, specify" option, it is essential to position the text box for additional input close to this option This placement clarifies that the text box is a subpart of the "Other" question For instance, a system may necessitate a specific answer to trigger one or more related follow-up questions.
6 Lists of values that have a logical order should be provided on the CRF in that logical order For example, the values of
“Low”, “Medium”, and “High” are logically placed in this order Do not list “Medium” first, “Low” second, and “High” third.
O RGANIZATIONAL B EST P RACTICES TO S UPPORT D ATA C OLLECTION
Num Best Practice Recommendation Rationale
CRFs should focus on collecting only the data that support protocol objectives and endpoints
The protocol should clearly state which data will be collected in the study
Data collected on the Case Report Form (CRF) should primarily focus on efficacy analysis and safety assessment of the investigational product, considering the associated costs and time required for data collection and processing Additionally, certain fields on the CRF may be included to enhance Electronic Data Capture (EDC) functionality and facilitate data review and cleaning through automated edit checks.
The protocol and accompanying SAP must be thoroughly reviewed to guarantee that all necessary parameters for analysis are collected and easily analyzable It is the statistician's responsibility to ensure that the Case Report Form (CRF) captures all data required to support the analysis.
2 CRF development should be a controlled, documented process that incorporates (as applicable):
CRF development should be controlled by SOPs covering these topics, as well as site training
A controlled process for developing CRFs will help ensure that CRFs comply with company standards and processes
3 The CRF design process should include adequate review and approval steps, and each reviewer should be informed on the scope of the review they are expected to provide
The team responsible for designing data collection instruments for a study must actively participate in developing the protocol and ensure that the Clinical Report Form (CRF) design team includes members with the necessary expertise.
Medical and scientific experts must deliver comprehensive information to ensure that clinical data standards staff, subject matter experts, and clinical data management personnel fully grasp the background, context, and medical significance of efficacy and safety data.
Clinical data management, along with standards subject matter experts and CRF designers, must thoroughly review the protocol to confirm that the proposed data collection methods are feasible Additionally, they should ensure that the appropriate standards are utilized in the development process.
• Statisticians should review the CRF against their planned analyses to make sure all required data will be collected in an appropriate form for those analyses
• Clinical operations staff should review the CRF to make sure the questions are unambiguous and that requested data can be collected
• Programmers should review the CRF to ensure that the manner in which the data are collected on the CRF is consistent with relevant metadata standards
• Regulatory experts should review the CRF for compliance with all applicable regulations
Involving reviewers from various functions enhances the likelihood of completing the Case Report Form (CRF) efficiently, while also reinforcing the evaluation of safety and efficacy as outlined in the protocol and Statistical Analysis Plan (SAP).
The CRF design team must ensure that data collection aligns with the implementer's processes and is user-friendly for site completion, facilitating accurate data entry.
• Pharmacovigilance personnel should review to ensure appropriate data capture and process to support expedited reporting
Ideally, the CRF should be developed in conjunction with the protocol (and the SAP if it is available)
All research-related data on the CRF should be addressed in the protocol to specify how and when it will be collected
Translations of Case Report Forms (CRFs) into other languages must be conducted through a controlled process by experts familiar with both the study questions and the target language and culture This translation process should follow a parallel approach, incorporating distinct reviews and approvals from qualified professionals Additionally, it may necessitate author approval and a separate validation of the translated instrument to ensure accuracy and cultural relevance.
CRFs that are translated into other languages should follow the same development process as the original CRF to ensure the integrity of the data collected
Consideration of translation should be part of the CRF development process Avoid the use of slang or other wording that would complicate or compromise translation into other languages
Cultural and language issues should be addressed appropriately during the process of translating CRFs to ensure the CRF questions have consistent meaning across languages
5 Data that are collected on CRFs should be databased
In certain fields, like "Were there any Adverse Events," the response—either "Yes" or "No"—may require databasing; however, this information will not be part of the submitted data.
Some fields, such as Investigator’s Signature, can be verified by the data entry staff, but an actual signature may not be databased unless there is an e-signature
When specific data is not necessary for the Case Report Form (CRF) but is essential for assisting the investigator or monitor, it should be documented on a site worksheet, such as an entry criteria worksheet or a dose titration worksheet.
All such site worksheets should be considered source documents or monitoring tools, and should be maintained at the site with the study files
6 Establish and use standardized case report forms Using data collection standards across compounds and
TAs saves time and money at every step of drug development
• Reduces production time for CRF design and reduces review and approval time
• Reduces site re-training and queries and improves compliance and data quality at first collection
• Facilitates efficient monitoring, reducing queries
• Improves the speed and quality of data entry due to familiarity with standards and reduces the training burden in-house
• Enables easy reuse and integration of data across studies and facilitates data mining and the production of integrated summaries
• Reduces the need for new clinical and statistical programming with each new study
G ENERAL R ECOMMENDATIONS ON S CREEN F AILURES
Screen failure data should be collected for individuals who do not pass the screening and are not enrolled in the study, if the sponsor chooses to do so According to Section 10.1 of ICH E3, which outlines the structure and content of clinical study reports, it is pertinent to report the number of patients screened and the reasons for their exclusion, as this information can help define the suitable patient population for future drug usage While not all studies may require screen failure data, it is advisable to gather this information according to the specific needs of the study protocol and the drug in question.
When utilizing CDASH, it is essential to collect minimum data, including a subject identifier and reasons for screen failure, typically noted on the End of Study form as "Screen Failure." This data facilitates the overall summarization of all screened and enrolled subjects, enhancing accountability in the clinical study report Additionally, other relevant information may be gathered, such as the date of informed consent, sex, race, date of birth or age, and specific details regarding ineligibility, like out-of-range lab values.
The SDTMIG does not specify a distinct domain for screen failure data and does not mandate its inclusion in the SDTM If screen failure data is submitted, it should be incorporated into the relevant SDTMIG domains For additional guidance on the submission of screen failure data, please consult the SDTMIG.
C ONFORMANCE R ULES
1 Core designations must be followed All Highly Recommended and applicable
Recommended/Conditional Fields must be present in the case report form (CRF) or available from the operational database
To ensure accurate data collection in the CRF, it is essential to utilize CDISC Controlled Terminology as outlined in the CDASHIG Metadata Table All codelists in the CRF must align with the latest published CDISC Controlled Terminology submission values Subsets, such as those provided in CDASH terminology, are acceptable for use Additionally, in Findings domains, relevant values from CDISC Controlled Terminology should be incorporated into Question Text, Prompts, and variable names For instance, if querying a subject's height, use "Height" from the VSTEST codelist as the Prompt and "HEIGHT" from VSTESTCD for the variable name.
3 Best practices must be followed The design of the CRF must follow guidance in Section 4.1, Best
Practices for Creating Data Collection Instruments and Section 4.2, CRF Design Best Practices
Standardization of CRF question wording is essential, utilizing CDASH Question Text or Prompts When data collection is denormalized, relevant CDISC Controlled Terminology should be incorporated into the Question Text, with the flexibility to use synonyms that map to CDISC Submission Values when necessary For instance, "ALT" is preferred over "Alanine Aminotransferase" for lab test prompts In cases where no controlled terminology exists, the implementing organization must standardize the Question Text for consistency CDASH aims to minimize variability between CRFs and promote semantic interoperability, hence Question Text must be used verbatim Additionally, familiar synonyms can replace SDTMIG variables on CRFs without compromising conformance, such as displaying SEX as "Male" or "Female" while using "M" and "F" in the operational database CDASH also allows for flexibility in Question Text tense and requires that translations for cultural or linguistic reasons maintain semantic consistency with the original text If more specific questions are needed beyond the provided prompts, a brief CRF Completion Instruction is recommended to clarify data requirements without altering the variable's intended meaning.
To ensure efficient mapping of collected data to SDTM datasets, it is essential to achieve end-to-end traceability of variable names from the data capture system to the SDTM datasets, facilitating the automation of electronic data capture (EDC) setup and subsequent processes In EDC systems, variable names for data collection fields and their corresponding names in the underlying database may incorporate various system components as part of the item identifier Before exporting data in a defined format, EDC systems may necessitate the inclusion of database references, such as the EDC page name or item group name Additionally, when data collection is conducted in a denormalized manner, appropriate CDISC Controlled Terminology should be utilized, as seen in the collection of Vital Signs results, where variable names can be derived from the Vital Signs Test Code codelist, such as using "TEMP" for temperature results.
Systolic Blood Pressure results can be captured using the variable names SYSBP or SYSBP_VSORRES When system constraints limit variable names to 8 characters, it is essential to maintain a consistent approach that reflects either the normalized root variable or the controlled terminology While all variable names defined by CDASHIG are 8 characters or fewer to meet SDTM limits, the maximum length of variable names is determined by the data management system, not by CDASH For horizontal data collection to ease the transition to SDTM datasets, it is advisable to create denormalized CDASH variables that incorporate the SDTMIG variable name and controlled terminology within the CDASH variable names The "Horizontal-Generic" domain level metadata in the CDASHIG tables serves as an example of this implementation, although it does not imply a conformance requirement.
For optimal efficiency, data exported from the operational database into an SDTMIG variable should ideally require no additional processing when the CDASHIG and SDTMIG variable names match This allows SDTM data programmers to confidently assume that the data is compliant with SDTMIG standards Minor adjustments, such as case changes, do not impact compliance, ensuring high-quality deliverables even for programmers who may not be well-versed in data capture practices.
To ensure the validity of validated questionnaires, ratings, or scales, it is essential to present the questions and response options exactly as they were originally validated Adhering to this format is crucial, even if it leads to Case Report Forms (CRFs) that do not align with CDASH best practices Restructuring these questionnaires could compromise their validity, and using them in their original format should not be viewed as a deviation from CDASH compliance.
Implementers should identify necessary additional data fields to meet specific study and therapeutic area requirements, as well as relevant regulatory and business practices For guidance on creating new data collection fields not defined in the CDASHIG, refer to Section 3.4, which outlines the process for developing these fields in the implementation guide.