1. Trang chủ
  2. » Thể loại khác

Methods of sample size calculation for clinical trials

123 6 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Methods of Sample Size Calculation for Clinical Trials
Tác giả Michael Tracy
Người hướng dẫn Stephen Senn, Academic Supervisor
Trường học Not specified
Chuyên ngành Not specified
Thể loại thesis
Năm xuất bản Not specified
Thành phố Not specified
Định dạng
Số trang 123
Dung lượng 875,02 KB
File đính kèm Methods of Sample.rar (747 KB)

Cấu trúc

  • 1.2 Clinical trials and the importance of sample size

Nội dung

Introduction

This thesis aims to explore the theories behind sample size calculations in various clinical trial types and to develop practical computer software for statisticians facing related challenges It focuses on creating tools for calculating meaningful sample sizes and powers, particularly in scenarios with uncertain endpoint variances or unconventional trial designs The first chapter provides background on power and sample size calculations across different data types Subsequent chapters highlight the need for new sample size calculation programs that are user-friendly while aligning with existing methods Chapter 4 examines the assumption that sample variance accurately estimates true variance for sample size estimation and proposes solutions for identified flaws Additionally, it addresses uncertainty in binary data studies and suggests coping methods The chapter concludes with the development of software to implement these remedies.

This chapter explores the relationship between power and sample size in clinical trials, highlighting the key factors that influence them It will discuss various types of clinical trials where sample size calculations are essential and analyze different methods for determining statistical power.

Clinical trials and the importance of sample size

Clinical trials are essential research studies designed to assess new medical treatments Before a new therapy can be marketed, it must demonstrate acceptable safety and proven effectiveness to both drug companies and regulatory authorities These trials play a crucial role in the development of new drugs and in discovering new applications for existing medications.

Clinical trials are costly and time-consuming endeavors, often exceeding $500 million and spanning several years To effectively compare the efficacy of various drugs, dosages, surgeries, or treatment combinations, it is crucial to design trials that maximize the likelihood of demonstrating a treatment effect Generally, increasing the number of participants enhances the probability of identifying significant differences in treatment effects, ensuring that observed variations are attributable to genuine treatment efficacy rather than random fluctuations.

In the US, over 40,000 clinical trials are currently seeking participants, leading to significant financial pressures as researchers offer substantial incentives for recruits amidst rising operational costs Balancing the need for a robust sample size to identify treatment effects with ethical considerations is crucial, especially when existing therapies already enhance the quality of life for patients It may be ethically questionable to enroll more participants in a trial for a new treatment if it risks providing them with inferior care Therefore, the sample size must carefully consider the clinical, financial, and ethical requirements of sponsors, participants, and future patients.

Power

In statistics, the power of a test refers to the likelihood of correctly rejecting a false null hypothesis In the context of this thesis, the power of a trial design or comparison of treatment effects is defined as the conditional probability that a statistical analysis will successfully demonstrate a significant advantage of one treatment over another, provided that a true superiority of a specified magnitude exists.

To better understand the concept of power, consider the world as idealised in hypothesis testing.

In hypothesis testing, a null hypothesis (H0) and an alternative hypothesis (H1) are established as logical opposites, where one is entirely true and the other is entirely false Statistical analysis is conducted on the data related to these hypotheses, leading to the acceptance or rejection of the null hypothesis If H0 is rejected, it results in the acceptance of the alternative hypothesis H1.

There are four possible states of the world:

H0is actually true, and is correctly not rejected.

H0is actually true, and is wrongly rejected in favour of H1, a Type I error.

H0is actually false, and is wrongly not rejected, a Type II error.

H0is actually false, and is correctly rejected in favour of H1.

H0is true Correct to not reject

Type I error Occurs with probabilityα

H0is false Type II error

In statistical hypothesis testing, the probability of making a Type I error (α) represents the likelihood of incorrectly asserting that a relationship or difference exists when it does not, thereby misvalidating a theory Conversely, the probability of a Type II error (β) reflects the chance of failing to identify a true relationship or difference, resulting in the rejection of a valid theory Understanding these probabilities is crucial for accurate hypothesis testing and theory validation.

In statistical analysis, 1-β represents the power of a test, indicating the likelihood of correctly identifying a relationship or difference when it truly exists A higher power enhances the confidence of trial designers in detecting hypothesized differences in treatment effects, making it a crucial factor in experimental design.

The types of trial of interest

Superiority trials, Equivalence Trials, and Non-inferiority trials

Superiority trials aim to demonstrate that one treatment is more effective than another, while non-inferiority trials are designed to establish that a new treatment's effect is not significantly worse than that of an existing treatment, within a predetermined margin.

Equivalence trials aim to determine whether compared treatments differ by a minimal, predefined margin In contrast, non-inferiority trials do not genuinely seek to prove “non-inferiority,” as that is the focus of superiority trials Rather, non-inferiority trials aim to show that a new treatment is, at most, only slightly inferior to a comparison treatment by a clinically insignificant amount.

The hypotheses that the investigator would like to establish for each type of trial are

H1 (non-inferiority) Effect A-δ < Effect B whereδis a clinically significant amount, and treatment B is a new treatment that is compared to existing treatment A.

Superiority trials are employed when new treatment advancements show minimal improvement over existing therapies, while non-inferiority trials are utilized when new treatments share technical similarities with current options or when existing treatments demonstrate moderate to significant effects or specific safety concerns Demonstrating superiority is generally more challenging than proving non-inferiority, necessitating larger subject populations for superiority trials.

Regulatory authorities vary in their acceptance of non-inferiority trials compared to superiority trials, resulting in the approval of certain drugs in different regions.

In this thesis the focus will be on superiority trials, and all formulae, trial designs, p values etc are for superiority.

The thesis is about ways of determining the correct sample size to achieve power in randomised controlled parallel and crossover superiority trials with one outcome variable.

In parallel group trials subjects are randomised to receive one of the treatments to be compared each They stay on the one treatment they are

Senn (2002) defines a crossover trial as “one in which subjects are given sequences of treatments with the object of studying differences between individual treatments (or sub-sequences of treatments)”.

The terms 'crossover', 'cross-over', 'cross over', 'change-over', 'changeover', and 'change over' refer to the same trial design and have been utilized in various recent publications For consistency throughout this thesis, I will use the term 'crossover', as it is the most widely recognized.

The AB/BA design is the most basic form of crossover trial, where participants receive two treatments across two periods—first either treatment A or B, followed by the other treatment in the second period In many studies, the term ‘crossover’ typically refers to this AB/BA design This approach aims to minimize between-subject variation by allowing each participant to serve as their own control, thereby enhancing the visibility of treatment effects in the results Consequently, fewer subjects may be required compared to parallel trials to determine if a significant difference in treatment effects exists.

The AB/BA design is just one of many crossover trial types; researchers can create trials with multiple treatments and periods Statistically, the most efficient design involves each participant receiving all treatments, resulting in an equal number of treatments and periods Alternatively, an incomplete blocks crossover trial allows subjects to receive some, but not all, treatments An illustrative example of this is found in Vollmar et al (1997), which details a complex design featuring seven treatments and five periods to compare asthma remedies.

An incomplete blocks design may be preferred over a complete block design for various reasons, particularly when sponsors face ethical or financial constraints that limit the duration of a trial This design typically requires fewer periods than a comparable complete block design, resulting in a shorter overall trial time However, since participants do not receive all treatments, this approach only partially mitigates the uncertainty stemming from patient variation, offering some advantages of a complete crossover design while still leaving room for variability.

Sample size and power calculations for Normal data

The effectiveness of a trial relies heavily on both its design and the chosen statistical analysis method Crossover trials can be analyzed using Ordinary Least Squares (OLS) or by considering subjects as a random effect, leading to varying outcomes, especially in incomplete block designs.

It is assumed that any observed difference in outcomes is a combination of treatment effect, ‘period effect’, pre-existing differences between subjects and random variation of the outcome variable.

To accurately calculate the required sample size for a trial, it is essential to first determine the type of analysis to be conducted and select the values for α, β, and δ, which represents the size of the relevant difference Additionally, the variance of the outcome variable must be established; for crossover designs, distinguishing between within-subject and between-subject variance can enhance estimation accuracy In parallel designs, this distinction is unnecessary as each patient contributes only one outcome, while in complete block designs, the results are influenced solely by within-subject variation, making it crucial to identify the proportion of variance that affects the results.

Invariance within nature can enhance variance estimation and yield a more robust power estimate By utilizing incomplete blocks to differentiate sources of variance, statisticians can recover between-subject information, facilitating a more potent random effects analysis It's important to note that in this context, fixed effects analysis treats subjects as fixed effects, whereas random effects analysis considers subjects as random effects.

When S is the test statistic, then the relationship between δand S will be[Julious SA 2005]

The fundamental equation serves as the cornerstone for deriving sample size and power equations for binary, ordinal, and normal data, as highlighted by Machin (1997) This equation establishes a relationship between sample size and various parameters, allowing for the derivation of a power equation tailored to the specific trial type For normal data, the statistic S represents the difference between means, particularly in parallel trials with two treatments In such cases, the variance of S is defined by the number of subjects assigned to treatment A (nA) and the population variance (σ), with the common assumption that the population variance remains equal across treatment groups during analysis (Julious SA).

2005] With these results, a link between sample size and power can be established.

The formulae for power for crossover and parallel, complete and incomplete blocks, fixed effects analysis and random effects analysis can be unified in a generalised form of equation.

The power of a trial to detect a treatment contrast can be calculated from a cumulative normal distribution in the form

The formula 1-β ≈ Φ(∆ - t1-α, df) represents the relationship between the treatment difference ratio (∆), which is typically denoted as δ, and the covariance of the treatment effects, with df indicating the degrees of freedom in the analysis The value of ∆ is influenced by factors such as trial design, the number of sequence repetitions, and the variances observed both between and within subjects.

But because any analysis would use the observed variance s 2 not the true varianceσ 2 , the power is more accurately calculated from a cumulative non- central t distribution in the form

1-β = 1-pt(t1-α,df,df,∆) (1.3) where∆is used as the non-centrality parameter [Senn 2002, Julious 2005].

An investigator plans to conduct an AB/BA crossover trial to evaluate the superiority of new drug B over drug A The clinically significant difference is set at 1, with both within-subject and between-subject standard deviations also at 1 The investigator seeks to determine the trial's power to detect this difference, assuming an enrollment of 20 subjects.

10 assigned to each sequence What is the power when the one-sided alpha is 0.025?

In a crossover design analysis, the degrees of freedom for assessing the power of a contrast is calculated using the formula (R*S*P)-(R*S)-P-T+2 Here, S represents the number of subjects required per block, R denotes the number of block repetitions, P indicates the number of periods, and T refers to the number of treatments For instance, with R=2, S=2, P=2, and T=2, the degrees of freedom (df) amounts to 18.

According to our preferred method using equation 1.3, the calculated power is 0.84844 In contrast, applying formula 1.2 yields a marginally higher power of 0.85574 While formula 1.3 provides the accurate result, formula 1.2 produces results that are relatively close.

In this thesis, Formula 1.3 is primarily utilized for power and sample size calculations involving normally distributed data However, Formula 1.2 offers the benefit of easier manipulation, which is advantageous when addressing uncertainties in the standard deviation estimate starting from Chapter 4.

Sample size and power calculations for binary data

In analyzing binary data, formulating the null hypothesis (H0) and the alternative hypothesis (H1) is less straightforward than with normal data This article focuses on two key methods for summarizing the differences between treatments: Odds Ratio (OR) and Absolute Risk Reduction (ARR) The probabilities of a binary outcome for treatments A and B are represented as pA and pB, respectively, with values ranging from 0 to 1 The Odds Ratio provides a comparative measure of the treatment effects expressed as a ratio.

OR = pA(1-pB) / pB(1-pA)

In Odds Ratio the H0would be Log(OR) = 0

Different types of hypotheses make distinct claims about the world and require varied statistical analyses Additionally, there are ongoing debates regarding the best methods for data analysis once the hypothesis is framed.

The power of a trial is influenced by factors such as sample size, trial type, pA, and the clinically significant pBor OR In the context of parallel trials, this thesis will utilize two power calculation methods: proportional difference, suitable for assessing an absolute risk hypothesis (H0), and the Odds Ratio method.

In general, the variance of the measure effect must satisfy

The variance of the log-odds ratio can be approximated as [Julious SA 2005]

From these the power for a proportional difference trial can be approximated as and the odds ratio power approximated as

These two methods give similar results, the odds ratio method usually calculating the power a little higher than the alternative, and subsequently gives slightly lower sample size estimates.

Crossover trials offer various methods for power calculations, including an approximate odds ratio (OR) approach When analyzing a crossover trial with binary outcomes, it's essential to evaluate the expected results based on the proportions of the four potential outcome combinations for the two treatments.

Table 1.1: Expected proportions of combined treatment binary outcomes

In clinical trials, researchers anticipate that some subjects will either succeed or fail on both treatments, but the key to identifying differences in treatment effects lies in the proportion of subjects with discordant outcomes The McNemar test is commonly used to analyze binary data, focusing solely on the numbers of discordant pairs By framing the sample size calculation around the discordant results, researchers can determine the necessary sample size for accurate analysis Additionally, approximating the conditional odds ratio can facilitate the calculation of the required discordant sample size.

To estimate the total sample size (NC) in relation to discordant samples (nd), divide the discordant sample size by the expected discordant proportion, as outlined in Table 1.1.

Conner and Miettinen propose an approximate odds ratio (OR) method for calculating sample size that requires fewer assumptions about the relationship between discordant and total sample sizes, leading to recommendations for larger sample sizes Julious introduces an alternative approach, suggesting that the power and sample size calculations for parallel trials can be based on the OR method, positing that a crossover trial with n subjects has similar power to a parallel trial with 2n subjects This raises questions about the advantages of crossover trials for binary versus continuous data, as continuous data might suggest a crossover trial could achieve similar power with fewer subjects However, Julious provides mathematical and empirical support indicating that for binary outcomes, the number of 'patient sessions' needed remains roughly the same for both trial types Nonetheless, crossover trials offer distinct benefits for binary outcomes, such as increased efficiency in reducing patient numbers while ensuring treatment groups are balanced in prognostic factors, thus enhancing confidence that observed differences stem from treatment effects.

The four crossover methods give fairly similar results, but the Conner andMiettenen give slightly lower power estimates than the other two.

Sample size and power calculations for ordinal data

Ordinal data shares similarities with binary data, with binary being a specific instance of ordinal Both types of data allow for various hypothesis formulations By making reasonable assumptions about the treatment effects across different levels, we can effectively analyze and interpret the results.

The odds ratio (OR) based method is the simplest approach for determining power and sample sizes in statistical analysis When dealing with ordinal data divided into n levels, it is assumed that the odds ratios are constant across these levels (OR1=OR2=…=ORn-1) This assumption underlies the Mann-Whitney U test, which is commonly used for analyzing ordered categorical data, as noted by Conover (1980) However, if there is a belief that a new treatment could have varying effects, such as shifting more individuals into extreme categories, then this assumption may not hold, rendering the Mann-Whitney U test inappropriate for such scenarios.

Formulating a hypothesis and estimating the variance of the outcome statistic for odds ratios (OR) is more straightforward than performing similar calculations for absolute risk reduction This difference highlights the complexities involved in sample size and power calculations.

For parallel trials a power based on an OR method can be used Using the same arguments as for binary parallel trials the power can be calculated as

When k=2, the equation aligns with the Odds Ratio equation used in binary trials Binary trials, characterized by two levels, represent a specific instance of the broader ordinal formula, demonstrating the relationship between the binary Odds Ratio and general ordinal analysis.

Crossover trials can be analyzed using two methods One approach is to assume that a crossover trial involving n subjects has comparable power to a parallel trial with 2n subjects Alternatively, power estimation can be calculated using specific statistical formulas.

The var(log(OR)) method gives lower power estimates than the parallel OR method usually, and gives higher sample size estimates.

SAS Programs for Calculating Sample Size

Computing approach to sample size calculation

Calculating sample size and power can be complex, making computer software essential for solving these equations SAS and R are ideal platforms for this purpose, particularly because they can handle incomplete block crossover trials, a design type for which no widely used package currently exists.

SAS, or Statistical Analysis System, is a specialized statistical software widely utilized in the pharmaceutical industry for analyzing clinical trial data With a presence in 40,000 locations globally, SAS is employed by 96 of the top 100 companies on the FORTUNE Global list.

SAS is the primary software utilized in the pharmaceutical industry for analyzing trial results; however, statisticians often resort to specialized packages for sample size calculations Having dedicated programs within SAS for these calculations would streamline the process, eliminating the need to switch between different platforms.

SAS features a modular design, with SAS/IML specifically focused on handling data matrices This component is essential for SAS programs, as its matrix language enables crucial operations such as matrix multiplication and inversion, which are vital for conducting power and sample size calculations.

R is widely used in academia and benefits from a large user base that continually enhances its functionality through custom programs One notable add-on is the Rpanel package, which enables programmers to create graphical user interfaces for R functions This feature allows users to easily adjust parameters, resulting in a more enjoyable and efficient experience Rpanel programs are designed for user-friendliness, offering accessible interfaces and improved graphical output when needed.

The goal is to develop a series of user-friendly programs for various trial designs, accommodating normal, binary, and ordinal data types The SAS programs will be designed for simplicity and efficiency, while the R Panels will enhance accessibility and leverage advanced graphical capabilities.

This chapter contains examples of SAS programs for calculating sample sizes and power for different types of superiority trial, and instructions for the programs’ use.

The first four programs deal with Normally distributed outcome variables, and there are also programs to tackle problems with binary and ordinal outcomes.

SAS program for Normal Data

First, a program to calculate power for a contrast between two treatments with Normal data for crossover or parallel trials This program is based around formula 1.3.

The program requires users to input data for treatment sequences, specify a sample size, and determine the desired power level Additionally, users must provide parameters for calculations and identify the treatments to be compared.

The program computes the covariance between two treatment effects for both fixed and random effects It uses the provided sigma and delta to calculate ∆, starting with R=1 The degrees of freedom (df) are determined by analyzing the entered design to extract S, P, and T With all necessary variables stored in memory, the program can now calculate the power for the sample size when R=1.

If the power calculated at R=1 is not at least as big as the power required then the∆and df when R=2 are computed and used in power calculations, and so

A: Displays whether design is balanced or not

When comparing multiple treatments, it is essential to identify the specific treatments being analyzed, as the statistical power to test different contrasts can vary significantly based on the study design.

C: The parameters that were used in the calculations

D: Power calculated for Fixed and Random Effects

E: The required sample size to achieve the requested power

Example 2.1.1: AB/BA Crossover trial

An investigator plans to conduct a crossover trial to evaluate the superiority of a new drug, B, over drug A, focusing on a clinically relevant difference of 1 With both within-subject and between-subject standard deviations set at 1, the study aims to enroll 20 subjects, divided into two sequences of 10 each The power of this trial to detect the specified difference, with a one-sided alpha level of 2.5%, is a critical aspect of the study's design.

Using program 2.1, he should set seq to {1 2, 2 1} for this design, and let delta, sig and lam all equal 1 R should be 10.

Running the program, he would find the design has a power of 0.84844 for both random effects and fixed effects.

In Example 2.1.1, the investigator opts to modify the trial design to a parallel trial, assigning equal numbers to treatments A and B He aims to assess the impact of this change on the trial's statistical power while maintaining a sample size of 20 Additionally, he seeks to determine the necessary sample size required to achieve a power level of at least 0.84844, comparable to that of the original crossover trial.

To achieve a random effects power of 0.32175 with Program 2.1, set seq to {1, 2} and beta to (1-0.84844) This configuration eliminates the possibility of a fixed effects model Consequently, to maintain the same power as the previous trial, the sample size must increase from 20 to 74 subjects, with 37 subjects per arm.

The investigator has decided to implement a more adventurous incomplete blocks design featuring five distinct treatment types The focus is on determining the statistical power to detect a clinically relevant difference between treatment A and treatment E, given a sample size of 20 Additionally, the analysis will explore the necessary sample size required to achieve a power of 90%.

Figure 2.1.2 shows how to code the sequence into Program 2.1 Beta should be set to 0.1.

The calculated power for fixed effects is 0.316, while random effects yield a higher power of 0.384 To achieve a 90% power with fixed effects analysis, 90 subjects are necessary, whereas only 70 subjects are needed when modeling subject variance as a random effect.

SAS Program for Normal Data 2

The existing program effectively handles basic trials, but its capabilities are restricted A more advanced program that can calculate the power of contrasts between treatment pairs in crossover or parallel trials with normal data, while accommodating complex treatment sequences, would be beneficial This enhanced tool would be particularly useful in scenarios where subjects are not randomly assigned in equal numbers to treatment sequences, whether due to design choices or unforeseen circumstances.

The program utilizes statistics similar to its predecessor, employing power calculations represented by the formula 1-pt(t1-α, df, df, ∆) However, the degrees of freedom (df) and the effect size (∆) are computed differently to accommodate the distinct method of sequence entry.

A: Displays whether design is balanced or not

B: The treatments to be compared

C: The sequences, and number of subjects assigned to each sequence

D: The parameters that were used in the calculations

E: Power calculated for Fixed and Random Effects

In an AB/BA crossover trial aimed at determining the superiority of drug B over drug A, the investigator seeks to assess the power of the study to detect a clinically relevant difference of 1, given that both the within-subject and between-subject standard deviations are also 1 With a total of 20 subjects enrolled, a clerical error resulted in 13 subjects assigned to one sequence and 7 to the other The investigator is interested in calculating the power of the trial under these enrollment conditions.

Using program 2.2, he should set seq to {1 2, 2 1} for this design, and seqreps to {13,7}.

The power has dropped slightly from the optimal 0.848 to 0.814 for both random effects and fixed effects.

(NB: The output describes the design as ‘balanced’ In these programs,

In the context of experimental design, 'balanced' refers to the equal power of contrasts among various treatments A design featuring only two treatments is inherently balanced, as it presents a single contrast between the two options, 1 vs 2.

SAS Program for Normal Data 3

This program efficiently calculates the power and necessary sample sizes for various treatment contrasts in both crossover and parallel trials involving Normally distributed data It is particularly beneficial for users seeking to evaluate the power of all contrasts simultaneously.

A: Displays whether design is balanced or not

B: The power of contrasts between all pairs for Fixed Effects

C: The power of contrasts between all pairs for Random Effects

D: The number of repetitions required to achieve power between each of the pairs for Fixed Effects

E: The number of repetitions required to achieve power between each of the pairs for Random Effects

In a 2-period, 5-treatment trial, the investigator aims to determine the power of contrasts between all treatments and the necessary sample size to achieve a 90% power level for each contrast.

(Fixed Effects) Power between pairs with 2 repetitions.

Fixed Effects Power Treatment1 Treatment2 Treatment3 Treatment4 Treatment5

Treatment1 0 0.1384528 0.1066142 0.1066142 0.1384528 Treatment2 0.1384528 0 0.1384528 0.1066142 0.1066142 Treatment3 0.1066142 0.1384528 0 0.1384528 0.1066142 Treatment4 0.1066142 0.1066142 0.1384528 0 0.1384528 Treatment5 0.1384528 0.1066142 0.1066142 0.1384528 0

(Random Effects) Power between pairs with 2 repetitions.

Random Effects Power Treatment1 Treatment2 Treatment3 Treatment4 Treatment5

Treatment1 0 0.1634582 0.139711 0.139711 0.1634582 Treatment2 0.1634582 0 0.1634582 0.139711 0.139711 Treatment3 0.139711 0.1634582 0 0.1634582 0.139711 Treatment4 0.139711 0.139711 0.1634582 0 0.1634582 Treatment5 0.1634582 0.139711 0.139711 0.1634582 0

(Fixed Effects) Required number of complete Replications to attain 0.9 power by pair

Fixed Effects Repetitions Required Treatment1 Treatment2 Treatment3 Treatment4 Treatment5

(Random Effects) Required number of complete Replications to attain 0.9 power by pair

Random Effects Repetitions Required Treatment1 Treatment2 Treatment3 Treatment4 Treatment5

The results from program 2.3 indicate an unbalanced design, with power values of 0.107 or 0.138 for fixed effects and 0.140 or 0.163 for random effects The random effects method proves to be more powerful, requiring fewer repetitions for 90% power, specifically between 14 and 18 reps (70 or 90 subjects), compared to the fixed effects method, which necessitates 18 or 26 reps (90 or 130 subjects).

SAS Program for Normal Data 4

This program efficiently calculates the power and required sample size for crossover or parallel trials involving normally distributed data, accommodating all treatment pairs simultaneously Additionally, it provides flexibility by allowing users to input irregular treatment sequences.

A: Displays whether design is balanced or not

B: List of the sequences and the number of subjects assigned to each

C: The power of contrasts between all pairs for Fixed Effects Here, fixed effects analysis is not possible, so a message explaining this is displayed instead

D: The power of contrasts between all pairs for Random Effects

SAS Program for binary Data

This is a program to calculate powers and sample sizes for trials with binary outcomes, for AB/BA crossover designs or parallel trials with two treatments.

The program necessitates the input of trial parameters to compute the power and sample size using four distinct binary methods outlined in the previous chapter.

A: The parameters used in the calculation

B: The power of the design, calculated by different methods, and the number of subjects required by each of the methods to achieve the requested power

In a clinical trial assessing the efficacy of new drug B in reducing the incidence of a condition currently affecting 40% of subjects on treatment A, researchers aim to determine the necessary sample size for an AB/BA crossover design With a two-sided alpha level set at 5%, the study seeks to achieve 90% power to detect a clinically significant odds ratio (OR) of 2.

Output from such a query is shown on fig 2.5.1 Between 200 and 206 subjects would be required, depending on method of calculation.

SAS Program for ordinal data

This is a program to calculate powers and sample sizes for trials with ordinal outcomes, for AB/BA crossover designs or parallel trials with two treatments.

A: The level and cumulative probabilities each level assumed for both treatments

B: The OR, two-sided alpha and the beta used in calculations

C: The power of the design as calculated by the different methods and the minimum sample size required to achieve desired power

R Programs for calculating sample size

This chapter discusses R programs designed to compute powers and sample sizes for various trial designs and response variable types An initial justification for utilizing R was provided in Chapter 2 The programs leverage the Rpanels software add-on developed by Bowman in 2006.

As for the SAS programs in the previous chapter, here there are programs to deal with Normal, binary and ordinal data.

R panel program for Normal data

This program enables the calculation of necessary sample sizes and the power of contrasts for both crossover and parallel designs, following statistical procedures akin to those utilized in program 2.1.

To specify the location of the TXT file that contains the treatment sequences for the design, enter the file path and press return to allow the program to recognize the new instruction It is important to note that in R, the directory structure for file locations uses the forward slash “/” as a separator, rather than the backslash “\.”

B: The treatments, identified by number, whose contrasts are to be analysed Press the “-” and “+” buttons to change the treatments compared.

C: Number of repetitions of the treatment sequences to be used in trial.

Click on the “-” and “+” buttons to change the number of repetitions.

D: Theδof the design, the size of a difference to be detected Press return after entering the new delta for the program to recognise the new instruction.

E: Theσwithin, the within-patient standard deviation Click on the “-” and

“+” buttons to decrease or increase the value.

F: Theβof the trial, the size of the type II error Click and drag the slider to change theβ, and thus 1-β, the desired power of the trial.

G: λ, the ratio ofσ 2 betweentoσ 2 within Click on the “-” and “+” buttons to decrease or increase the value.

H: The one-sidedα, the size of the type I error Click on the “-” and “+” buttons to decrease or increase the value.

I: Button to display information on the selected treatment sequence.

Information is outputted to the R log.

J: Radio button to change between Fixed and Random effects plots and analysis.

The 5-period, 7-treatment, 21 patient incomplete block crossover design, as illustrated in Fig 3.1.2, can be formatted for a txt file by organizing each subject's regimen in separate rows Each row should be separated by a single carriage return, and the treatments assigned to each period must be distinguished by a single space.

Figure 3.1.3 illustrates the output from program 3.1 for an AB/BA crossover trial involving 14 subjects analyzed using random effects The histogram depicts the relationship between sample size and power, with sample sizes ranging from zero to the number needed to exceed the specified power, or the size determined from 3.1.1C, whichever is greater Bars in the histogram are green, except for the bar representing the stated repetitions, which is red if Fixed Effects is selected in 3.1.1J, while Random Effects are shown in blue The required power is indicated by a dashed line, with the power of the sample size displayed above the plot for both methods, using red for Fixed Effects and blue for Random Effects The results for the selected model are presented in a larger font, showing that the calculated power for this complete blocks crossover is 72.1% for both methods Additionally, the treatments compared, sample size, total number of treatments, and number of sequences are displayed, with the calculated sample size needed to achieve the desired power highlighted in the corresponding color scheme; in this case, 24 participants are required to achieve a power of 90%.

Figure 3.1.4 presents the output log for an AB/BA crossover trial upon clicking button 3.1.1I It initially shows the analyzed sequence in its original format (refer to the analysis in figure 3.1.2 for further details) Subsequently, it indicates whether the design is balanced and provides a summary of the design type.

In this example the design is correctly assessed as a balanced complete blocks design.

R panel program for binary data

This program allows the user to calculate sample sizes for trials with binary outcome response variables It uses the sample size calculation methods mentioned in chapter 1.

A: pA, the response anticipated on treatment A.

B: Radio button to select between inputting Odds Ratio or pB, the response anticipated on treatment B.

C: Text entry box to enter either Odds Ratio or pB.

D: The two-sidedα, the size of the type I error Click on the “-” and “+” buttons to decrease or increase the value.

E: Theβof the trial, the size of the type II error Click and drag the slider to change theβ, and thus 1-β, the desired power of the trial.

The text entry box labeled 'F' allows users to input 'n', which represents the prospective trial size In the context of parallel trials, 'n' indicates the size allocated to each arm, whereas for crossover trials, 'n' refers to the total trial size.

G: Radio button to choose between parallel trial and crossover trial.

Parallel trials are trials with two different treatments or treatment levels with equal allocation to each arm Crossover trials are AB/BA designs with equal allocations to each sequence.

H: Button to show results in log.

The output displayed in Figure 3.2.2 serves as a log, highlighting the variables utilized, including pA, pB, and OR It calculates the value of pB or OR if not provided by the user.

In this analysis, the power of the contrast between treatments A and B is evaluated for a crossover trial involving 100 subjects Four methods of power calculation are employed for this crossover design, yielding similar results with power estimates ranging from 61.2% to 63.1%, with Conner’s method providing the lowest estimate, which is a common occurrence Conversely, if a parallel trial design were chosen, only two methods of power calculation would be utilized.

The necessary sample sizes for achieving the desired statistical power are outlined for each power calculation method, with results generally consistent across methods However, the Conner method typically requires the largest sample size For instance, to attain a power of at least 80%, the estimated sample size ranges from 150 to 156 participants.

R panel program for ordinal Data

Parallel Trial sample size, Normal Data

In a clinical study example from "Sample Size Tables for Clinical Studies, 2nd Edition" by Machin et al (1997), an investigator aims to compare the blood pressure changes caused by a placebo and a drug To detect a difference of 5 mmHg between the two groups, with a standard deviation of 10 mmHg, the researcher must determine the necessary patient recruitment Additionally, the calculation will change if the expected effect size is increased to 10 mmHg.

The authors calculate the required sample size to detect 5 mmHg to be 172,and 44 for 10 mmHg.

The answers calculated by program 2.1 would also be 172 for 5 mmHg, but it calculates a larger sample of 46 should be used for 10 mmHg Using nQuery also gets answers of 172 and 46.

Fig 3.4: Machin’s sample size formula

Machin et al utilized a corrected-Normal approximation of the cumulative non-central t distribution for sample size calculations, improving upon traditional methods Their formula is derived from a specific case of the general equation 1.2, tailored for parallel trials with equal subject numbers in each arm, and includes a correction to enhance the accuracy of equation 1.3 While this approach simplifies calculations and yields better results than the raw 1.2 equations, it may still lead to a consistent underestimation of required sample sizes for Normal data.

In a study by Wollard and Couper (1981), the effectiveness of moducren versus propranolol as initial treatments for essential hypertension was evaluated The researchers aimed to assess the change in blood pressure resulting from each medication With a limited recruitment of approximately 50 patients per drug and targeting a medium effect size of ∆= 0.5, they sought to determine the statistical power of their test at a two-sided significance level of alpha=0.05.

Machin et al obtained a power estimate of approximately 0.70 using standard tables By utilizing program 2.1, we can confirm a power of 0.697, which aligns with Machin's findings, demonstrating that sample size tables are typically sufficient for simple designs Additionally, researchers may prefer the convenience of consulting a tables book over running software, eliminating concerns about data entry accuracy for achieving desired results.

Crossover trial, Normal Data

Julious (2005) provides a comprehensive list of sample sizes required for achieving 90% power in AB/BA crossover trials, which align with the findings from Programs 2.1 and 3.1 For instance, a Δ of 0.1 necessitates a sample size of 2104, a figure corroborated by both programs Thus, a set of sample size tables is adequate for this straightforward crossover trial design, provided that the Δ and type I and type II error rates adhere to conventional standards.

Parallel Trial sample size, binary Data

Again from Machin et al, in their example 3.1 they ask: With a two-sided alpha

=0.10,π1= 0.25 andπ2=0.65, how many subjects are needed for a parallel trial with equal numbers in each arm to achieve a power of 0.9? Their answer is 25 in each arm, 50 in total.

From 2.5, by Proportional Difference method 23 each arm, totalling 46 ByOdds Ratio, 24 are needed each arm, making 48 overall These results are both fairly similar.

Machin goes on to ask how changing two-sided alpha to 0.05 effects sample size, before giving 62 as the answer By 2.5 we get 56 by prop diff, and 58 byOR.

Crossover Trial, Binary data

In a study designed by Julious (2005), the investigator aims for a marginal response of 40% for the control therapy, with an effect size of 2.0 favoring the control The study is set with Type I and Type II error rates at 5% and 10%, respectively Julious concludes that approximately 200 participants are needed With a control response rate of pA = 0.4, a Type I error of 5%, and an odds ratio (OR) of 2, various methods yield participant estimates ranging from 200 to 206, aligning with Julious's findings.

Parallel Trial, Ordinal data

In Machin's example (3.10), with control levels set at pA1=0.14, pA2=0.24, pA3=0.24, and pA4=0.38, a sample size of approximately 90 is needed to achieve 80% power to detect an odds ratio (OR) of 3 at a 5% alpha level Utilizing program 3.3, the calculated sample size is 92 subjects, closely aligning with Machin's estimate, which provides reassurance regarding the accuracy of the calculations.

Crossover Trial, Ordinal data

Julious gives an example, with pA1 = 0.08, pA2 = 0.191, pA3 = 0.473 and pA4

= 0.256 What sample size required, when alpha = 0.05, to detect an OR of

By this thesis’ programs almost identical results are achieved, 214 or 230 depending on preferred method.

Incomplete Block Design, Normal data

Because no other program or table could be found that can help calculate power for this type of design, another method was found to validate the programs.

In a balanced incomplete blocks trial with a sample size of 39 subjects, divided into 3 treatments across 2 periods, the power of the study can be determined with specific parameters: delta set at 1, sigma at 1, lambda at 1, and a one-sided alpha level of 0.025, treating the subject as a fixed effect.

The trial setup was simulated 100,000 times, revealing a significant difference in 85,923 instances, indicating a power of approximately 85.9% Calculations using programs 3.1 and 2.1 confirmed a power of 86.0%, demonstrating the reliability of the program for incomplete block crossover trials.

Discussion of comparisons

The comparison between the thesis's programs and nQuery software, along with sample size tables, demonstrates a strong alignment, with only minor discrepancies that are unlikely to significantly affect trial design This confirms the validity of the programs presented in the previous two chapters However, the inability of nQuery and Machin's tables to accommodate more complex trial designs necessitated a simulation-based approach to validate the sample size for an Incomplete Block design, highlighting the limitations of those traditional methods in contrast to the thesis's innovative programs.

The use of sample size calculations

Sample size calculations are crucial for trial designers, as they help optimize resource use by ensuring an appropriate number of subjects are randomized and preventing unrealistic trials While tools and software for these calculations are readily available, their application remains limited; a survey of surgical trials revealed that only 38% reported using sample size calculations prior to the study This oversight often results in trials being conducted with insufficient participants, significantly reducing the likelihood of detecting meaningful treatment effects, which poses both scientific and ethical concerns.

Perhaps almost as worrying, even when sample size calculations are known to have taken place they are often inaccurate [Freiman et al, 1978][Vickers,

This chapter examines the conventions and challenges associated with parameter selection and estimation for sample size calculations It highlights the inherent imperfections in variance estimates and offers recommendations for effectively addressing this uncertainty through alternative sample size calculation methods.

Alpha, beta and the treatment difference

When calculating sample size, it's essential to consider acceptable levels of type I and type II errors Typically, it is preferred to minimize type II errors, which occur when a false null hypothesis is not rejected, rather than type I errors, where a true null hypothesis is incorrectly rejected Consequently, the significance level (α) is usually set to be smaller than the power (β).

In superiority trials, a one-sided alpha level of 0.025 is commonly employed for sample size calculations, while the beta value can vary from 0.4 to 0.1, corresponding to a desired power of 60% to 90% Ensuring that clinical trials meet regulatory and financial standards is crucial, as trials with insufficient power may be deemed unethical in certain circumstances.

A clinical expert assesses how to quantify the clinically significant difference, denoted as δor or odds ratio (OR) A smaller δor or log(OR) necessitates a larger sample size for accurate analysis When the absolute difference between pA and pB is the primary focus, pB can be derived from pA and the OR.

Sample standard deviation as an estimator of population standard

Power and sample size calculations for Normally distributed data require that a value for population standard deviation be entered into an equation.

Equation 1.3 treats the true population standard deviation as being known, but this is not realistic To know the true population variance for some endpoint one would have to measure the attribute in every member of the population in question For equations like 1.3 a point estimate must be made for sigma. s given sigma

When evaluating the accuracy of a standard deviation estimate, it's essential to understand the connection between the true standard deviation (σ) and the sample standard deviation (s) The ratio of m*s²/σ², where s² represents the sample variance and m is equal to n-1 (with n being the number of subjects), adheres to a chi-squared distribution with m degrees of freedom Additionally, the ratio s²/σ² is distributed as chi-squared divided by m, while the ratio s/σ corresponds to the square root of that distribution.

Figure 4.1 illustrates the probability density function (pdf) of the sample standard deviation (s) derived from a sample size of 5 The true population standard deviation (σ) is indicated by a red dotted line, while the blue dotted lines represent the 95% prediction interval for s The expected value of s is depicted by a solid green line, and the median s is shown with a dot-dash blue line Although the distribution appears nearly normal, it exhibits a slight left skew and a long right tail The 95% interval is wide yet centered around the true value of σ.

The statistic s² serves as an unbiased estimator for σ², while s itself is only asymptotically unbiased for σ In practical applications, s tends to underestimate σ, particularly with smaller sample sizes For instance, with a sample size of 5, the expected value of s is approximately 94% of σ Additionally, at 4 degrees of freedom, s underestimates σ around 59.4% of the time Consequently, when derived from a pilot study with a sample size of 5, s is likely to be lower than the values observed in larger trials, occurring about 56.8% of the time in a trial of size 20.

Using the observed s from a sample size of 5 as an estimate ofσ is undesirable, the prediction interval is very wide so there is no reassurance that s is close toσ.

Figure 4.2 illustrates the distribution of s from a sample with 10 degrees of freedom, indicating that the expected value of s is 97.5% of σ While this improvement is noteworthy, the prediction interval remains broad, suggesting there is a 95% probability that s will fall between 56.98% and a higher value.

143.12% ofσ This large interval means s is still an unreliable estimator of σ.

As the sample size increases to 26, the expected value of the sample standard deviation (s) remains below the population standard deviation (σ) but within 1% The 95% confidence interval for s is wide, ranging from 72.4% to 127.5% However, larger sample sizes lead to more reliable and less biased estimates of σ, with prediction intervals becoming increasingly symmetric around the true value With 50 degrees of freedom, the expected value of s reaches 99.5% of σ, with a prediction interval of approximately ±19.5% By the time the degrees of freedom reach 100, this interval narrows to about ±13.8% To achieve 95% confidence that the observed s is within 3% of the true σ, a sample size of approximately 2,134 is required, at which point s is nearly unbiased, with an expected value of around 0.9999σ.

The required sample size n to be (1-β)*100% confident that s is within x% of n = 1 /2(Z1-β 2

From this formula, one can see that to double accuracy one must quadruple sample size.

It is the probability characteristics of sigma given an observed s that are more important, as any estimate we make will be for sigma based on observed s.

Figure 4.4 shows the pdf of sigma given observed s of 1 from a population of

5 Comparing with fig 4.1, which shows s given sigma, it can be seen that sigma distribution is far more skewed The 95% confidence interval for (σ|s) for 4 degrees of freedom is much wider and asymmetrical than the 95% prediction interval for (s/σ|σ) but they are related- if the 95% PI for (s/σ|σ) is (a,b) then the 95% CI for (σ/s|s) is (1/b,1/a).

The relationship between E(s|σ) and E(σ|s) is not reciprocal, with the expected value of σ being 1.2518 times greater than that of s Additionally, the expected value of s is significantly closer to σ Given the sample size, it is insufficient for making precise assumptions about σ, which has a 95% probability of falling between 60% and 287% of s, indicating a substantial margin of uncertainty.

Figure 4.5 illustrates that the relationship between σ and s varies when s is set to 10 instead of 1, while maintaining the same sample size This alteration results in proportional changes, where both confidence intervals (CIs) and expected values are simply multiplied by 10.

If the sample size were 11, the confidence intervals tighten and are a little more symmetric (0.698717 , 1.754934), and the expected value of sigma drops to 1.0837.

By 25 degree of freedom the distribution is looking less skewed and the CI is more symmetric The expected value of sigma is only 3% greater than s.

As the sample sizes increase the trends with skewness and confidence intervals continue The distributions for (s|σ) and (σ|s) get very similar as m increases.

Small sample sizes lead to unreliable estimates, and while increasing the pilot study can enhance accuracy, it requires a fourfold increase in sample size to achieve just double the accuracy To ensure that the estimated standard deviation (s) is within 1% of the true standard deviation (σ), a sample size of approximately 19,000 is necessary Consequently, striving for such precision with point estimates for σ may significantly strain the resources of the trial designer.

Methods of incorporating uncertainty over variance of Normal data into

Research indicates that the sample standard deviation (s) is not a reliable estimator for the population standard deviation (σ), often resulting in underestimation A study by Vickers (2003) revealed that approximately 80% of sample size estimates in major journals were too low, attributing part of this issue to the limitations of using s as an estimator.

To effectively address this issue, we can adjust the value of s by a multiplication factor that reflects our confidence level By knowing the size of the pilot study, we can determine these factors to ensure a specific probability of selecting an estimate that meets or exceeds σ Additionally, we can calculate E(σ/s|s,m) to obtain an unbiased estimate for σ Refer to Table 4.1 for the multiplicative factors corresponding to various sample sizes.

When selecting a multiplicative factor, sponsors must consider the acceptable risk of the trial being underpowered Opting for 95% confidence in utilizing at least σ often results in trial sizes that are significantly larger than necessary, leading to increased costs for reassurance.

An effective alternative is to employ calculation methods for sample size and power that focus on estimating expected power rather than relying on potentially unreliable point estimates By incorporating the distribution of (σ|s) into these calculations, researchers can achieve more accurate estimates of expected power.

Recall equations 1.2 and 1.3, the equations for when the population standard distribution is known.

This is the normal approximation of the cumulative non-central t distribution correct form 1.3

Using the same terminology, it can be shown [Julious SA, 2005] that where the true variance is unknown but estimated from data with m degrees of freedom that

Expected (1-β) ≈ pt(τ,m, t1-α,df) (4.1) pt(τ,m, t1-α,df) approachesΦ(τ- t1-α,df) as m approaches infinity, so equation 4.1 can be considered the equivalent of equation 1.2 for unknown variance.

There is not an equally concise equation that gives the exact expected power, but arithmetic methods can be used to arrive at a more accurate estimate.

An arithmetic method can be employed to create a PDF for the sigma distribution based on parameters s and m, utilizing the inverse square-root chi-squared distribution By marking a large number (Q) of equally spaced quantiles of this distribution, power calculations can be conducted using these sigma values according to equation 1.3 As Q increases, the mean of the power estimates converges to the expected power, with the method described in subsequent chapters utilizing a Q of 999 This value strikes a balance between accuracy and computational efficiency, ensuring reliable estimations without excessive processing time on modern computers.

The method with a Q of 999 is referred to as Arithmetic Method 4.2 for the remainder of the thesis.

It would be useful to compare the results gained from equation 4.1 and arithmetic method 4.2 to see how they differ.

Table 4.2 presents various power estimations derived from different calculation methods for an AB/BA crossover trial, based on the setup outlined in example 1.1 The analysis incorporates a variance estimate of 1, obtained from a pilot trial, while power calculations are conducted using degrees of freedom set at 10 and 100,000, across sample sizes ranging from 2 to 30 Additionally, results obtained through equation 1.3 are included for comparison.

The analysis indicates that method 4.1 generally provides lower estimates compared to method 4.2, particularly when power is around 40% As power increases between 40% and 70%, the estimates from both methods remain closely aligned, while 4.1 yields slightly higher estimates for power levels exceeding 70% Although the significant discrepancy under 40% raises concerns, it is unlikely that trials would be designed with such low power, suggesting that both methods yield similar results for realistic trial designs Additionally, for very high degrees of freedom, method 4.2 converges towards the value defined by equation 1.3, whereas method 4.1 approaches equation 1.2 and may even surpass 1.3 Despite the small exceedance, it remains illogical for the expected power to exceed the actual power based on the distribution of sigma given s.

The study examines whether the variation in power estimation between two methods influences the calculated sample size across different effect sizes, represented as ∆ = δ/σ Below are the required sample sizes derived from various equations for a range of ∆ values across a selected number of m.

Table 4.3: Sample Sizes calculated for 90% power with 2.5% one-sided alpha

Table 4.4: Sample Sizes required for 80% power with 2.5% one-sided alpha

Table 4.5: Sample Sizes required for 70% power with 2.5% one-sided alpha

Table 4.6: Sample Sizes required for 60% power with 2.5% one-sided alpha

Table 4.7: Sample Sizes required for 50% power with 2.5% one-sided alpha

Table 4.8: Sample Sizes required for 40% power with 2.5% one-sided alpha

Table 4.9: Sample Sizes required for 30% power with 2.5% one-sided alpha

Table 4.10: Sample Sizes required for 20% power with 2.5% one-sided alpha

Table 4.11: Sample Sizes required for 10% power with 2.5% one-sided alpha

Table 4.12: Sample Sizes required for 2% power with 2.5% one-sided alpha

Tables 4.3 to 4.12 present the required sample sizes for an AB/BA crossover trial, calculated across various beta values (0.1 to 0.98) and deltas (0.1, 0.2, 0.5, and 1) Notably, the sample sizes derived from different methods are strikingly similar, particularly for beta values below 0.9, where results are nearly identical Even at a beta of 0.9, the differences remain minimal Significant divergence occurs only at a beta of 0.98, where one method provides substantially higher estimates Since trials are unlikely to target a power of 2% or lower, we can confidently conclude that equation 4.1 will yield sample size estimates comparable to Arithmetic method 4.2 for feasible trials.

Expected Power compared to Power calculations using point estimates

The expected power methods and 's-adjustment' methods for sample size calculation serve different purposes and should be chosen based on the specific objectives of the analysis For determining the likelihood of a trial achieving sufficient power, a point estimate derived from the percentile of the distribution of (σ|s,m) is recommended Conversely, if the goal is to obtain the most accurate estimate of power while considering all available information, the expected power method should be utilized.

UsingE(σ|s,m) as an estimate forσdoes not result in power calculations of

E(1-β), in fact there is no multiplicative factor for s that will calculate expected power using the traditional methods of power calculation.

To achieve a specific expected power, the multiplicative factor required for the sample size (s) is solely dependent on the parameter m Statisticians can leverage this relationship to estimate sample sizes for desired power levels, even when using calculators or software that do not utilize cumulative non-central t distributions For instance, when calculating a sample size for an expected power of 80% using a conventional normal distribution-based equation, it is advisable to adjust the observed sample size (s) by multiplying it by an appropriate factor.

In an AB/BA trial with an expected power of 80%, the required sample size is 450, as indicated by both AM 4.2 with a standard deviation (s) of 1, mean (m), and a delta of 0.2, and by equation 1.3 with a calculated σ of (2.285+(0.5/13))**(1/13) Additionally, Table 4.13 presents various rough adjustments for different expected powers.

These guidelines serve as rough estimates and should not be considered precise calculations The equations were developed through an iterative mathematical method to provide approximations based on variables such as m and power.

Selecting pA

In binary data type sample size calculations, the parameter pA must be estimated from a pilot study or prior data, similar to how sigma is estimated for normal data However, unlike normal data, there are various approaches to accurately estimate or characterize the distribution of the true pA based on observed incidence, particularly when dealing with small sample sizes.

Different ways to calculate confidence intervals include the Wilson Score [Wilson, E B 1927], Wald and the Exact method [Clopper, C J., and

For large samples, confidence intervals yield similar results; however, for smaller samples, the declared x% confidence intervals are often only nominal (Sauro & Lewis, 2005) The Exact method is conservative, ensuring that the confidence interval has at least the nominal probability of containing pA In contrast, the Normal-distribution based Wald method is generally less powerful than it suggests, despite its ease of calculation An adjusted version of the Wald method, however, provides some of the best results for very small sample sizes (Agresti & Coull, 1998).

If a point estimate for pA is calculated from a sample, then a sample size of n when n = 1 /4(Z1-β 2

/ (x/100) 2 ) will mean there is at least a (1-β) chance that the observed pA will be within x percentage points of the true pA.

A larger sample size enhances our confidence that the observed incidence accurately reflects the true value of pA For trial designers who are uncertain about the observed rate, utilizing the upper bounds of a confidence interval can provide a more cautious estimate of pA.

When dealing with normally distributed data, it is advisable to exercise caution regarding reliance on a sample pA Instead, it is more prudent to conduct power and sample size calculations that consider the assumed conditional distributions of (pA|observed pA).

4.6 Methods of incorporating uncertainty over pA into sample size calculations

While there is no direct equivalent to equation 4.1 for binary outputs, utilizing a confidence interval approach to express the uncertainty of pA effectively allows for the recovery of quantiles These quantiles can then be integrated into an arithmetic method akin to that used in equation 4.2.

Confidence interval calculations at varying confidence levels allow for the comparison of boundaries to determine the quantiles of a probability density function (pdf) for the true pA based on the observed pA These quantiles are then utilized in power calculations to analyze the distribution of power in relation to the distribution of pA The choice between interpreting the null hypothesis as concerning log(OR) or pB significantly influences the estimated expected power.

When conducting power calculations, it's crucial to hold either the odds ratio (OR) or the probability of success (pB) constant, depending on which is deemed more significant If the OR is prioritized, the quantiles of pB can be derived from the probability density function (pdf) of pA and the fixed OR By comparing the quantiles of pA, which are evenly distributed by probability, with those of pB, the mean of the power calculations serves as an approximation for expected power Stability in results is greater when OR is held constant, as certain quantiles of pA may closely align with pB, leading to a less skewed pdf of power compared to scenarios where pB is constant.

Like AM 4.2, the method applied in my SAS and R programs uses 999 quantiles.

Simulation-based power estimation

In complex trial designs, understanding the relationship between sample size and power can be challenging Utilizing modern simulations allows researchers to evaluate the proportion of trials meeting specific criteria For instance, in a crossover trial comparing the efficacy of new drug B against drug A, the investigator sought to determine the power to detect a clinically relevant difference of 1, given a sample of 20 subjects (10 per sequence) and a one-sided alpha of 0.025 The simulation revealed a power of 0.84844, demonstrating the effectiveness of this approach in assessing trial outcomes.

In a simulation of 450,000 clinical trials using R, each trial maintained the same variances and sizes as Example 1.1, with a true treatment effect indicating that treatment B was 1 unit more effective than treatment A The analysis employed ordinary least squares to assess whether a significant difference between the treatment effects was identified, as detailed in Table 4.14 Notably, the findings are consistent with those derived from a random effects model, as the t-values and p-values for treatment effects were identical across both methods in a complete block crossover design.

If the true power of each individual trial is 0.84844, approximately 381,798 out of 450,000 trials would be expected to show a significant difference in treatment effect Table 4.14 indicates that a significant difference was detected 381,694 times, which is very close to the expected value Assuming a Binomial distribution for the number of trials with a significant difference, we can calculate the 95% Clopper/Pearson confidence interval for the true power, which is (0.84724, 0.84917), encompassing the value of 0.84844.

The trial's power has been accurately estimated at 0.85574 through simulation, surpassing the precision of traditional methods outlined in equation 1.2 This simulation approach is particularly beneficial for analyzing non-Normally distributed data, as it avoids the inaccuracies associated with variance approximations in formula-based methods.

Using brute force in simulations can yield more accurate power estimates compared to flawed formulas However, there is a risk of obtaining unreliable estimates, as simulation results may vary and lack consistency Additionally, regulatory bodies may disapprove of this non-standard method.

Trial designers should adhere to standard calculation methods for sample size unless there is substantial reason to question their relevance to the specific details of the trial.

Conclusion: Summary, and Discussion

amp; 6

Ngày đăng: 03/09/2021, 23:15

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN