1. Trang chủ
  2. » Thể loại khác

Brief guidelines for methods and statistics in medical research

114 9 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Brief Guidelines for Methods and Statistics in Medical Research
Tác giả Jamalludin Ab Rahman
Trường học International Islamic University Malaysia
Chuyên ngành Community Medicine
Thể loại book
Năm xuất bản 2015
Thành phố Kuantan
Định dạng
Số trang 114
Dung lượng 6,72 MB
File đính kèm 51. Brief Guidelines for Methods.rar (5 MB)

Cấu trúc

  • Preface

  • Contents

  • 1 Planning a Research

    • Abstract

    • 1.1 Building Problem Statement

    • 1.2 Effective Literature Search

      • 1.2.1 Strategies (Planning)

      • 1.2.2 Search

      • 1.2.3 Screen

      • 1.2.4 Sort

      • 1.2.5 Summarise

    • 1.3 Choosing Best Study Design

      • 1.3.1 Observational Study

      • 1.3.2 Cross-Sectional Study

      • 1.3.3 Case-Control Study

      • 1.3.4 Cohort Study

      • 1.3.5 Experimental Study

    • 1.4 Sampling Terms

    • 1.5 Choosing Sampling Method

      • 1.5.1 Probability Sampling

      • 1.5.2 Simple Random Sampling

      • 1.5.3 Systematic Random Sampling

      • 1.5.4 Cluster Random Sampling

      • 1.5.5 Stratified Random Sampling

      • 1.5.6 Non-probability Sampling

    • 1.6 Calculating Sample Size

      • 1.6.1 Sample Size for Population-Based Study

      • 1.6.2 Sample Size for a Single Proportion

      • 1.6.3 Sample Size for a Single Mean

      • 1.6.4 Sample Size for Two Proportions

      • 1.6.5 Sample Size for Two Means

    • 1.7 Observations and Measurements

      • 1.7.1 Role of a Variable

      • 1.7.2 Level of Measurement

      • 1.7.3 Data Distribution

      • 1.7.4 Preparing Data Dictionary

      • 1.7.5 Validity and Reliability of Research Instrument

    • 1.8 Data Quality Control

    • 1.9 Plan for Statistical Analysis

    • 1.10 Critical Information in Research Proposal

  • 2 Analysing Research Data

    • Abstract

    • 2.1 Descriptive Statistics

      • 2.1.1 Describe Numerical Data

      • 2.1.2 Describe Categorical Data

    • 2.2 Analytical Statistics

      • 2.2.1 Concept in Causal Inference

      • 2.2.2 Hypothesis Testing

      • 2.2.3 State the Hypothesis

      • 2.2.4 Set a Criterion to Decide

      • 2.2.5 Choosing Suitable Statistical Test

      • 2.2.6 Making a Decision

    • 2.3 Comparing Means

      • 2.3.1 Compare One Mean

      • 2.3.2 Compare Two Means

      • 2.3.3 Compare More Than Two Means

      • 2.3.4 Compare Paired Means

    • 2.4 Comparing Proportions

      • 2.4.1 Compare Independent Proportions

      • 2.4.2 Compare Paired Proportions

    • 2.5 Comparing Ranks

      • 2.5.1 Compare Two Independent Nonparametric Samples

      • 2.5.2 Compare More Than Two Independent Nonparametric Samples

    • 2.6 Covariance, Correlation and Regression

      • 2.6.1 Correlation Coefficient Test

      • 2.6.2 Simple and Multiple Linear Regression

    • 2.7 General Linear Model

      • 2.7.1 ANOVA and ANCOVA

      • 2.7.2 MANOVA and MANCOVA

      • 2.7.3 Repeated Measures ANOVA

    • 2.8 Logistic Regression

    • 2.9 How to Analyse, in Summary

  • Datasets

  • References

  • Index

Nội dung

Building Problem Statement

A well-crafted problem statement encapsulates the essence of a study, bridging past research and anticipated outcomes It should be developed after conducting a thorough literature review Before initiating the search for information, it's essential to have a preliminary understanding of the problem at hand Begin with a basic problem statement, gather relevant references and data, and then refine the statement based on newfound insights.

The problem statement should clearly define the primary issue that prompts the study, detailing the reasons for its significance and the necessity of conducting the research It must also outline the relationship between the relevant variables associated with the problem Finally, the statement should conclude with a description of the anticipated outcomes of the study.

To effectively describe a problem, it's essential to visualize the relationships between variables This can be achieved by illustrating the connection between an outcome and an explanatory variable, often represented in a simplified format, such as a bubble chart or flow chart, which is commonly referred to as a conceptual framework.

Obesity, as an outcome, is closely linked to dietary factors It is essential to define obesity clearly, which can be measured as a dichotomous variable—either present or absent In this context, the presence of obesity is indicated by a "Yes" response.

1 Conceptual framework is not a causal diagram but it is useful if causality is integrated in the construction of the diagram especially in quantitative studies.

The research plan focuses on individuals with a Body Mass Index (BMI) of 30 kg/m² or higher Diet is assessed through 24-hour dietary recall, quantifying caloric intake in kilocalories (kCal) The analysis indicates a direct correlation: as kCal consumption increases, so does the likelihood of obesity.

Obesity is influenced by various factors, including physical activity, calorie intake, and genetics While some of these factors can be measured directly, genetic influences are more complex to assess A notable indicator of genetic predisposition to obesity is the presence of obesity in first-degree relatives, although this method is not entirely accurate This relationship can be visually simplified as shown in Fig 1.4.

In research, it is essential to clarify the focus of our inquiry regarding obesity Are we aiming to identify significant factors linked to obesity, or are we seeking to establish that high calorie intake is an independent contributor to this condition?

Fig 1.3 Relationship between diet and obesity

Fig 1.2 Relationship between a factor and outcome

Fig 1.4 Relationship of obesity with calorie intake, physical activity and family history of obese

Calorie intake is a crucial factor linked to obesity, remaining significant even when considering other influences like physical activity and family history The relevance of these additional factors to obesity is secondary.

After controlling for physical activity and family history, it's crucial to clarify our research objectives While the statistical analysis may appear similar, the interpretation varies significantly Initially, we aim to identify various factors associated with obesity; however, our later focus narrows specifically to the relationship between calorie intake and obesity Recognizing physical activity and family history as confounding variables highlights the importance of clearly defining our objectives to effectively describe the conceptual framework.

Identifying key variables and understanding their relationships can simplify the construction of a problem statement However, it's important to note that not all research necessitates a complex statement; in many clinical experiments, the primary goal may simply be to demonstrate that a new intervention is more effective.

Effective Literature Search

Strategies (Planning)

The literature review is crucial for research success, and it becomes more manageable when we clearly define our objectives Establishing a conceptual framework or problem statement aids in identifying dependent and independent variables Additionally, documenting authors' names along with the relevant domain of interest—such as epidemiology, therapeutics, diagnostics, or prognostics—can enhance the search for pertinent references It is also essential to reference authoritative sources that have been frequently cited in the field.

Search

In this phase, we will conduct our actual research utilizing online resources like PubMed Central, Google Scholar, and specific journal websites such as BMJ, Lancet, and JAMA It's essential to use the specific keywords identified in the previous step Additionally, bibliographic managers like EndNote or Mekentosj Paper can help in locating references To enhance our search results, we should apply specific filters, such as limiting our search to recent articles published within the last few years.

5 years only, or limit based on certain study design or even language.

Screen

After applying filters, we discovered hundreds, if not thousands, of articles that require screening for relevance To expedite this process, we initially review the titles and tag those that seem pertinent Next, we examine the abstracts of the selected articles, and if they align well with our research, we proceed to obtain the full texts.

Sort

To effectively utilize our selected articles, we must categorize them based on the type of information they provide, such as those useful for introductions, design justification, or supporting statistical test choices Additionally, prioritizing articles by their relevance to our research is essential, as it may not be feasible to read every single one.

Summarise

It's essential to read each article based on its significance, as summarizing them during the reading process can be highly beneficial Creating a summary table that includes the first author's name, year of publication, study design, sample size, instruments used, statistical analyses, and results will aid in organizing the information effectively After summarizing, it's advisable to reorganize the articles according to the insights gained A thorough literature search enhances our understanding of our objectives and expectations.

Choosing Best Study Design

Observational Study

Observational study designs are categorized into three main types: cross-sectional, case-control, and cohort studies, as illustrated in Figure 1.7 The key aspect of these designs is to clearly identify what is being measured or observed.

Cross-Sectional Study

In a cross-sectional study, both the outcome and the influencing factor are observed simultaneously For instance, when examining the relationship between obesity and diet, researchers interview participants about their dietary history and then measure their height and weight to assess obesity status This approach presents a challenge, as it does not allow for the determination of whether the diet reported was the same prior to the onset of obesity Consequently, the inability to separate the observation of the factor from the outcome limits the capacity to establish causality in cross-sectional studies.

Table 1.1 Guide in choosing best design

Objective Cross-sectional Case-control Cohort Experimental

Fig 1.7 Type of observational studies a Cross-sectional b Case-control c Cohort

Case-Control Study

A case-control study involves two groups: cases, which have the outcome of interest (e.g., obesity), and controls, which do not In this retrospective design, outcomes are predetermined, and researchers focus on measuring factors or exposures associated with these outcomes rather than the outcomes themselves This backward observation relies on the recall capability of respondents, as historical values cannot be analyzed through specimens Consequently, case-control studies are susceptible to measurement biases, particularly recall bias.

In a case-control study, the number of participants with and without the outcome is predetermined, making the measurement of prevalence unnecessary This design inherently involves selecting samples based on the outcome of interest, rendering prevalence calculations irrelevant.

Cohort Study

A cohort study is a prospective research design that involves observing participants over time to measure specific outcomes This study can begin either in the present or historically, with the latter referred to as a retrospective cohort study A crucial requirement is that participants must be free from the outcome of interest at the study's inception For instance, when investigating the causes of obesity, the study should start with non-obese individuals and follow them for a defined period Additionally, researchers can categorize participants based on specific exposures related to obesity, allowing for a more detailed analysis of the relationship between these exposures and the outcomes observed.

To investigate the impact of a sedentary lifestyle on body weight, we can recruit non-obese individuals with sedentary habits and compare them to non-obese individuals with active lifestyles This study could involve contrasting office workers with those employed in physically demanding jobs, such as construction After a designated period, such as ten years, we will analyze and compare the weight changes between these two groups.

In our study, we analyze the obesity rates among office workers compared to those in construction over a decade It is important to note that during this period, some participants may relocate, while others might choose not to continue their involvement in the study.

A retrospective study should not be mistaken for one that relies solely on old records; rather, it refers to the observation of data in a backward manner The design of a study is independent of the data source, meaning that historical records can be utilized in various study designs, including cross-sectional studies.

1.3 Choosing Best Study Design 9 followed up This is the common disadvantage of a cohort study; loss to follow up (attrition) There is also a possibility that some office workers change their occu- pation Same goes to those labours If the problem is not serious and not many, we can drop the participants from the group and compare only those remaining. Cohort study is able to show causal relationship because it has temporal association.

In a cohort study, we begin with a group of individuals who have not yet experienced the outcome and monitor them over time to observe its occurrence This approach offers a significant advantage over cross-sectional studies, which lack the ability to establish temporal relationships Additionally, case-control studies do not effectively differentiate between exposure and outcome, limiting their capacity to draw causal inferences.

Experimental Study

This study requires an experimental approach, which can involve animals, patients, or communities The ideal experimental study includes a control group that does not receive any intervention Additionally, many studies feature multiple treatment groups to assess the effects of a drug at various dosages.

An essential aspect of experimental studies is the careful selection of subjects, adhering to strict criteria to ensure that only individuals with specific conditions, such as dyslipidaemia for lipid-lowering trials, are included This approach minimizes confounding variables, ensuring that factors like age, gender distribution, and illness severity are consistent across groups Participants are then randomized into treatment and control groups, with the control group receiving standard medication and the treatment group receiving the new drug To investigate the dose-response relationship, treatment groups are further categorized based on varying dosages of the new medication.

Sampling Terms

Before we can start selecting the study subjects, we should plan the sampling strategy It can be done by specifying thesefive terms:

Selection criteria in research can be categorized into inclusion and exclusion criteria Inclusion criteria specify the characteristics required for participants to be part of the study, while exclusion criteria outline the characteristics that disqualify potential participants It is important to avoid stating both male as an inclusion criterion and female as an exclusion criterion, as the inclusion of males inherently implies the exclusion of females from the study.

The target population refers to the broader group from which research results are inferred, while the study population is a representative subset that researchers can access For instance, in the National Health and Morbidity Survey (NHMS) III conducted in 2006, the target population included all Malaysians, whereas the study population was limited to households, excluding those in institutional settings such as hostels, army camps, or correctional facilities.

A sampling frame refers to the comprehensive list of sampling units, which are the specific characteristics being measured In the context of the National Health Morbidity Survey (NHMS), the primary sampling units identified were Enumeration Blocks (EB) and Living Quarters (LQ).

EB is defined as geographical area which is artificially created to have about

Living quarters typically range from 80 to 120 units and are defined by various types of boundaries These can include natural boundaries like rivers, administrative boundaries such as mukim or district lines, man-made boundaries like roads and railways, and even imaginary lines that connect locations on a map In some cases, ecological boundaries (EBs) may lack clear definitions, encompassing only a few localities or villages that are difficult to access by road, such as Orang Asli settlements in Peninsular Malaysia and remote areas in Sabah and Sarawak (Department of Statistics Malaysia 2014).

5 The third 10-yearly national survey on health by Ministry of Health Malaysia.

In the NHMS study, the sampling frame consisted of two components: a list of Enumeration Blocks (EBs) and a list of Localities (LQs) Sampling was conducted initially on EBs, followed by LQs Within each selected LQ, all household members were interviewed and examined, serving as the observation units for the study.

We can apply these sampling terms in studies especially those aiming at rep- resenting population It is important that we choose probability (random) sampling method.

Choosing Sampling Method

Probability Sampling

Probability sampling ensures that every sample has an equal chance of selection, making the process truly random This randomness implies that it should be impossible to replicate the exact same samples using the same technique.

Fig 1.9 Type of sampling method

Simple Random Sampling

Simple random sampling is the most straightforward and effective method for selecting a subset from a larger group For instance, if you have 20 subjects and need to sample 4, you can write each name on a small piece of paper, mix them in a bowl, and draw four at random Alternatively, you can utilize a random number table by assigning numbers to the subjects and selecting them randomly, such as by dropping a pencil onto paper to choose the nearest number Regardless of the method, the key is that the selection process is entirely random, ensuring unbiased results For example, when selecting 4 patients from a list of 20, you can first sort their names alphabetically, assign numbers from 1 to 20, and then proceed with the random selection.

20 We dropped the tip of the pencil on to the paper with the random number without looking at it If the pencil pointed to location near Row 13 and Column 24,the nearest number is 8 Since our population is only 20 names, the number should not exceed 2 digits So we take number 68 Since we have up to 20 numbers, we choose number 8 instead Actually all numbers in table are random So we do not need to repeat the sampling up to 4 times for 4 samples What we can do is to select all the subsequent numbers instead We need to decide which direction to move prior to the sampling to avoid bias Let say we plan to move to the left, so we should select number 68, 73, 65 and 81, and we only use number 8, 3, 5 and 1.Remember that the key point is that we should not be able to replicate this process This is crucial when we use software to calculate random number Software that is able to repeat exactly the same order is not actually random enough.

Systematic Random Sampling

The main difference between simple and systematic random sampling is the fre- quency of‘random’sampling process In simple random sampling above, to select Fig 1.10 Simple random sampling

To conduct random sampling of 4 subjects from a total of 20, it is necessary to perform the sampling process four times unless utilizing a random table In the case of systematic random sampling, only a single selection is required A straightforward approach involves sorting the subjects, possibly by their names, and then dividing them into four distinct groups to facilitate the sampling process.

In this example, we will organize 4 groups, each containing 5 subjects A random number between 1 and 5 will be selected; for instance, if number 3 is chosen, we will extract the subjects positioned at number 3 from each group.

Cluster Random Sampling

Cluster random sampling involves distributing subjects into homogeneous groups known as clusters For instance, if we aim to represent a state with four equally sized districts in terms of demographics, we can randomly select one district to represent the entire state Once a district is chosen, we may sample the entire population within it or a subset for logistical reasons This method is cost-effective, as it allows for concentrated sampling in one district rather than spreading efforts across all four districts.

Fig 1.11 Example of a random number table (Taken from Hill AB (1977) A Short Textbook ofMedical Statistics J B Lippincott Company, (Hill 1977))

When clusters are not perfectly homogenous, a common occurrence, it can lead to bias in variance measurement, known as the design effect (Killip et al 2004) This bias must be considered in both sample size calculations and data analysis.

Strati fi ed Random Sampling

Stratified random sampling involves dividing subjects into groups known as strata, unlike cluster random sampling In this method, all strata must be included, and they are defined based on specific characteristics such as sex, age groups, and location.

Using similar example, to sample 4 out of 20 people with male and female equally distributed, 2 samples from each sex shall be randomly selected (Fig.1.14).Therefore, both strata shall have representative.

Non-probability Sampling

Non-probability sampling plays a crucial role in research, particularly in clinical trials Researchers do not always need to rely on random samples; for instance, when studying diabetic patients with specific conditions, investigators can directly enroll qualified patients from their practice without the need to create a comprehensive list of potential subjects.

Patients can be selected for a study if they meet specific inclusion and exclusion criteria Unlike convenience sampling, which involves selecting participants without any guidelines, purposive sampling requires a defined list of criteria that the selected patients must fulfill Additionally, quota sampling is a method that halts once a predetermined number of samples has been collected.

Calculating Sample Size

Sample Size for Population-Based Study

This study aims to generalize our findings to a larger population, which may encompass a district, state, or country, typically defined by specific geographical boundaries For instance, we may conduct research to assess the prevalence of hypertension within a state or to characterize diabetic patients across an entire country.

The sample size is influenced by several key factors, as outlined in Table 1.2 A crucial element is the expected outcome, which represents the researcher's anticipated value for the primary result This value can be derived from previous studies or, in the absence of such data, must be estimated by the researcher based on their understanding of the outcome.

Desired precision refers to the variation from an expected outcome, such as measuring the prevalence of hypertension in a district For instance, if national data indicates a 35% prevalence rate, we might anticipate similar results in our study area However, actual outcomes may differ, necessitating a best estimate of variation By reviewing previous research, if findings suggest a prevalence range of 30% to 40%, we can determine that our estimate's precision is approximately 5%.

In population-based research, a more precise expectation requires a larger sample size, similar to hitting a bull's eye in archery; the smaller the target, the more accurate the shot must be Just as an archer needs to release more arrows to hit a smaller target board, researchers must account for greater variability in heterogeneous populations, particularly regarding socio-demographic characteristics.

Table 1.2 Factors that affect sample size calculation

2 Desired precision level (margin of error)

1.6 Calculating Sample Size 17 for example), the less likely for us to obtain precise result We should expect higher variation Hence, more sample required to achieve precise estimates.

Design effect (Deff), introduced by Kish in 1965, refers to the impact of sampling design on research outcomes When sampling methods deviate from simple random sampling, adjustments must be made to account for this effect, necessitating a larger sample size for studies with a high design effect.

All research that try to represent population must consider all these information in the sample size calculation.

The selection of sample size calculation is influenced by the measurement level of the primary outcome In obesity studies, the main outcome is often assessed categorically as either obese or not obese (dichotomous) Alternatively, obesity can be quantified through body fat percentage, which is a numerical measurement.

Sample Size for a Single Proportion

A single proportion study focuses on measuring a specific proportion, such as the prevalence of hypertension To determine the sample size needed for this type of study, the following formula can be used: n = (z^2 * p * (1 - p)) / d^2 In this formula, 'p' represents the expected outcome, 'd' denotes the required precision, and 'z' is the z-value corresponding to the desired confidence level Typically, the confidence level (α) is set at 0.05, which gives a z-value of 1.96.

So using the same hypothetical example, expecting the prevalence of hyper- tension of 35 % with precision of 5 % usingαof 0.05, the sample size,nis: nẳ1:96 2 0:35 1ð 0:35ị

6 Level of measurements include nominal (dichotomous), ordinal and continuous Please refer to Sect 1.7.2.

The formulas provided for calculating sample size are suggested rather than ideal, and the list is extensive These formulas are simplified for most research purposes, but detailed considerations, such as testing for differences versus equality, are not covered in this discussion.

8 Con fi dence level is the degree of con fi dent that the sample selected represents the true value(of the population parameter).

For accurate sampling results, it is essential to utilize simple random sampling; otherwise, the sample size must consider the design effect (Deff), which is typically estimated at 1.5 based on prior research Consequently, if Deff is set at 1.5, the required sample size increases to approximately 524.4, calculated by multiplying 349.6 by 1.5.

When determining the final sample size for a household survey, it is crucial to consider the response rate, as some individuals may refuse to participate or may not be home during visits Anticipating a non-response rate of 20% necessitates increasing the initial sample size by the same percentage, resulting in a target sample size of approximately 629.3 If the study does not involve stratification, this figure can serve as the minimum target However, if stratification is included, the number of strata must be factored into the calculation It's important to remember that the planned sample size is always an estimate, so it should be rounded up to 630 for practical purposes.

Sample Size for a Single Mean

For research involving continuous outcome measurements, such as fasting blood glucose expressed in mmol/L, the recommended formula for sample size calculation is: n = (z² * σ²) / d² In this formula, σ represents the standard deviation, z is the z-value corresponding to the desired confidence level (α/2), which is 1.96, and d denotes the required precision.

To study the mean blood glucose with a known standard deviation and a precision requirement of 0.5 mmol/L at a 95% confidence interval, the minimum required sample size (n) is calculated using the formula n = (1.96^2 * 0.5^2) / (0.1^2).

Similarly to the previous exercise above, if the design effect is 1.5, non-response rate of 20 % and no stratification involved, thefinal sample size is 172.9, rounded to

In a heterogeneous population, a higher standard deviation (σ) necessitates a larger sample size, while an increased precision requirement, indicated by a lower value of d, also leads to an expanded sample size.

Sample Size for Two Proportions

When conducting research that involves comparing values, such as between treatment and control groups or pre- and post-intervention, it's essential to utilize appropriate formulae based on the chosen statistical test With various options available, a practical approach is to use sample size software I recommend the PS: Power and Sample Size Calculation tool by Dupont and Plummer Jr (1990) for effective analysis.

In a case-control study comparing smoking behavior between normal subjects and lung cancer patients, we utilize a Dichotomous tab to analyze the dichotomous variables of smoking status For example, if the smoking rates are 30% in one group and 40% in another, this can be entered as shown in Screen 1.1 Typically, we assume a sample size ratio of 1 between the two groups The analysis employs the χ² test, corrected by Fisher’s Exact Test Although a case-control design is commonly used, a prospective approach is selected here Upon calculation, as illustrated in Screen 1.2, approximately 380 subjects are needed per group for robust results.

760 subjects But do not forget to account for non-response as mentioned previously.

Sample Size for Two Means

If we wish to compare mean blood sugar of between male and female with expected difference of 0.5 mmol/L and estimated of 1 mmol/L within group standard

To ensure accurate results, the sample size calculation indicates that 64 participants are needed per group, totaling approximately 130 subjects Considering a potential 20% non-response rate, it is advisable to aim for around 160 total subjects for the study.

Screen 1.1 Calculate sample size for comparing two proportion using software

The within-group standard deviation refers to the variability of blood sugar levels within each group When comparing two groups, it's common to encounter different standard deviations; however, for sample size calculations, the largest standard deviation is typically used to ensure accurate representation This value is derived from previous studies or pilot research, providing a reliable estimate for the analysis.

We chose independent in theDesignbox because mean blood sugar in male is not related to mean blood sugar in female.

When comparing blood sugar levels before and after treatment, it is essential to use a paired analysis, as the subsequent level is influenced by the previous one If the values remain constant, this results in a smaller sample size Additionally, standard Power and Sample Size Calculation software typically accommodates only two groups; for more complex sample size calculations, consulting a statistician and utilizing advanced software is recommended.

Screen 1.2 The sample size calculated

Observations and Measurements

Role of a Variable

In research, each variable must serve a specific purpose, and irrelevant variables should be avoided to ensure the research objectives are met When exploring causality, at least two variables are involved: the dependent variable, which is the outcome, and the independent variable, which acts as the factor or exposure Identifying the distinction between dependent and independent variables is crucial for accurate analysis and interpretation of results.

Level of Measurement

Variables are measured in different ways; some can only be counted and expressed as percentages, while others offer more detailed information through statistical measures like mean, median, and mode.

1 Nominal—Also known as dummy coding The variables have different cate- gories which are mutually exclusive but not ordered It shows qualitative dif- ference between categories Observations are countable (frequency) We can describe mode but not mean or median Two values nominal is known as dichotomous E.g gender, race.

2 Ordinal—Variable that shows rank or order Distance between ranks is not measurable We can describe using count, mode and median Mean is used in many studies but it is inappropriate conceptually E.g Cancer stage, Likert scale or pain score.

3 Interval—The variable has degree of difference but not ratio The main char- acteristic is that it has no absolute zero e.g temperature 100°C is hotter than

50°C but it is not twice hotter And 0°C does not mean no temperature We can measure it by mean, median and mode

4 Ratio—The variable has all the properties of interval It is a measure that shows difference with true zero We can describe it ratio, mean, median and mode. E.g Hb, blood sugar, weight etc.

Understanding levels of measurement is crucial as it influences how we summarize data and the statistical methods we employ The hierarchy of measurement indicates that interval and ratio data, which are numerical, offer more detailed information than categorical data Importantly, we can use different levels of measurement for data collection and analysis For instance, while we may record age in years, we might categorize it as "young" or "old" for reporting purposes, using 60 years as a cutoff Similarly, other variables like BMI and blood glucose can also be analyzed categorically, demonstrating the flexibility in how we interpret and report measurements.

When collecting data, it is essential to gather it at the highest level of measurement, allowing for categorization at a later stage Once variables are recorded as categorical, they cannot be reverted to their numerical form, making it crucial to carefully consider the type of variable during data collection.

Data Distribution

To analyze numerical measurements, it's essential to assess data distribution, which can be characterized by central tendency and dispersion Central tendency indicates that the majority of data points cluster around the center of the distribution, while dispersion reveals the extent of data spread A more effective way to visualize data distribution is through a histogram.

When we measure body weight among 100 random residences of a village for example, there will be 100 observations (Data1.1).

The average body weight is 49.7 kg, with a standard deviation of 2.0 kg This data can be visualized in a histogram, where a normal curve is fitted to illustrate a typical normal distribution The distribution features a single peak, or mode, that aligns with both the mean and median values, creating a symmetrical bell-shaped curve.

Fig 1.17 Distribution of body weight (kg)

The characteristics or a normal distribution are:

In SPSS, to check for distribution of data, we use Explore command (Screen 1.5). SPSS Analysis: Check data distribution

4 Transfer wt (weight measured in kg) to Dependent List

Screen 1.5 How to describe numerical variable using Explore

The Descriptives table reveals that the average weight is 49.7 kg, with a standard deviation of 2.0 kg The skewness value is 0.049, and the kurtosis is −0.492 Together with the Histogram (Fig 1.17), these statistics indicate that the body weights are normally distributed.

Data that does not meet specific characteristics can be transformed through methods such as logarithm, exponentiation, or inversion Alternative descriptions and analyses of the data will be explored in Section 2.1.

Preparing Data Dictionary

A data dictionary serves as essential documentation for researchers, data enumerators, and statisticians, ensuring that all terms used in research are standardized and consistently interpreted It is crucial to identify and define all variables necessary for the research, which are derived from the conceptual framework This careful definition of variables promotes clarity and uniform understanding among all stakeholders involved in the research process.

Output 1.1 Descriptive statistics for weight

1.7 Observations and Measurements 29 no one standard template for data dictionary However, you can alwaysfind such document from most major studies 10 The proposed information required for a data dictionary is presented in Table1.3.

Validity and Reliability of Research Instrument

In research, we utilize various instruments to collect and measure variables, often through targeted questions that provide valuable data from study participants Typical inquiries may include age, smoking habits, medical history, and daily routines such as exercise Additionally, we employ specialized apparatus, such as weighing scales for body weight, sphygmomanometers for blood pressure, and blood analyzers for comprehensive health assessments.

Table 1.3 Suggested information for data dictionary

1 Name The name normally required in computer such as in data base and statistical analysis Name can be in one short word e.g agecat for Age Category

2 Label The name that can appear in table, graph or report

In this research, hypertension is defined as a condition diagnosed through verified medical records or prescribed medications, alongside specific blood pressure measurements of 140 mmHg or higher for systolic pressure, and/or 90 mmHg or higher for diastolic pressure References supporting this definition should be included to provide clarity and context.

When applicable, we will detail the instruments utilized, specifying the brand and calibration method as necessary Additionally, the credibility of the employed model will be supported by referencing established documentation.

Should specify either it is nominal, ordinal or continuous

If the variable is categorical, the options should be speci fi ed e.g. Gender; Male = 1, Female = 2

If the variable is numerical, we should specify its unit e.g mmol/L, mg/dL

How precise the variable is measured e.g age is measured to the nearest 1-year old Income is measured to the nearest RM100

Data linkage is crucial when variables are interconnected For instance, if a respondent is male, a question regarding pregnancy may have a missing value Similarly, Body Mass Index (BMI) is associated with both weight and height, highlighting the importance of understanding these relationships in data analysis.

The NHANES study provides a comprehensive data dictionary accessible at [CDC NHANES Data](http://www.cdc.gov/nchs/data_access/data_linkage/mortality/restricted_use_linked_mortality.htm), while the Avon Longitudinal Study for Parents and Children at the University of Bristol also offers a data dictionary found at [Bristol ALSPAC Data](http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/) In research, it is essential to utilize valid and reliable instruments, including questionnaires and imaging devices like X-ray machines, to accurately assess conditions such as blood sugar levels and fractures.

Accurate instruments are essential for measuring true values, such as a sphygmomanometer that should read 120/80 mmHg for accurate blood pressure If the device malfunctions or improper technique is used, readings may be inaccurate For example, a body fat analyzer should reflect a body fat percentage of 30%, not 15% Even a well-functioning device can yield incorrect results if proper procedures are not followed, such as when a patient wears conductive jewelry during measurement Additionally, when inquiring about a patient’s occupation, it is crucial to specify the duration, asking, “What is or are your occupation(s) for the last 12 months?” This ensures the occupation is relevant to the outcome being studied, as a job started just yesterday may not influence the results.

A reliable instrument consistently delivers the same results, as evidenced by blood pressure readings that should remain similar when measured multiple times If the readings fluctuate significantly within a short period, it indicates a potential issue with the device.

Measuring subjective behavioral variables like stress, happiness, attitude, and awareness often requires multiple questions to obtain accurate responses This process involves a detailed validation procedure that includes identifying content, constructing the questionnaire, and confirming its effectiveness However, a comprehensive discussion on conducting a validation study is beyond the scope of this book.

Validating research instruments is straightforward, provided the same device is consistently used throughout the study It's essential that the instrument has documented validity and reliability from the manufacturer, meeting the standards set by relevant authorities Proper procedures must be followed, including the correct preparation and positioning of subjects; for instance, ensuring participants fast appropriately when measuring fasting blood sugar to avoid inaccurate results Additionally, regular calibration of the instrument is crucial, and all procedures should be thoroughly documented in the research report.

For an instrument to be considered valid, it must also demonstrate reliability; without reliability, validity cannot be established However, some reliable instruments may still be invalid, as they can consistently produce the same incorrect result, such as a poorly calibrated weighing scale that fluctuates in its readings A common analogy for understanding validity and reliability is found in archery, where the goal is to consistently hit the X ring at the center of the target for maximum points An athlete who consistently hits the center demonstrates both validity and reliability in their performance.

1.7 Observations and Measurements 31 time is reliable If he hits the centre few times and misses many times, then he is not considered as a good athlete The arrow may hit the centre by chance and not because he is good (not valid and not reliable) If he is a good archer but failed to adjust the bow properly or he has a problem seeing the board clearly, he may hit the same spot all the time, but the spot is not the centre (Fig.1.18).

Data Quality Control

Accurate data measurement is essential, but it is equally important to ensure that the data is recorded correctly Whether using paper-based or electronic data entry methods, implementing thorough quality checks at every level is crucial for maintaining data integrity.

To ensure accurate data entry, a field supervisor should be assigned to oversee the process While it is ideal for the supervisor to review all entries, randomly checking a portion of the data is a practical alternative If numerous errors are detected, the enumerators must be informed, and all their records should be thoroughly examined.

Utilizing electronic data entry can significantly streamline the data collection process by allowing for the implementation of validation rules within the record form In Malaysia, the identification number (MyKad) provides essential information regarding respondents' age and gender, enabling accurate verification For instance, the system can issue a warning if an enumerator inputs an age exceeding 100 years, prompting them to confirm the entry to avoid errors Additionally, electronic devices facilitate the enforcement of skip questions, such as those regarding the last menstrual period for female respondents, allowing male respondents to bypass irrelevant questions However, it is crucial to consider the practicality of using technology in the study area, as reliable battery life and internet access may be necessary, and there could be limitations due to a lack of electricity or connectivity.

Fig 1.18 Analogy for validity and reliability

Data collection in the field is commonly performed using paper-based questionnaires, which are later entered into a computer To enhance data quality, the double data entry technique can be implemented, where two different individuals enter the information from the same questionnaire This system is designed to identify discrepancies in the data entries, and in case of any differences, a third person is required to resolve the issue and input the accurate value.

At the data analysis stage, we can run descriptive statistics to check data range and frequency for categorical variables.

Plan for Statistical Analysis

Before data collection begins, a comprehensive statistical analysis plan is essential for effective research This plan should outline the analytical approach based on the research objectives, incorporating both descriptive and analytical statistics It is important to detail how key variables will be summarized, followed by more in-depth analyses, which may include bivariable and multivariable techniques A well-structured statistical plan culminates in the creation of a dummy table.

In this context, the term "table" encompasses various formats such as figures, graphs, or text A dummy table represents our anticipated presentation of analysis results aligned with our objectives, as illustrated in Fig 1.19.

To compare blood glucose level between gender

(linkage data) Status Variable name

Blood glucose As measured Dependent glu Interval mmol/L 0.1 999

Gender As reported Independent sex Nominal 1 = Male,

2 If glu Normal, run Independent sample t-test; if glunot Normal, run Mann-Whitney U-Test

Male nn.n (n.n) n.nnn nn 0.nnn

Fig 1.19 Example of a dummy table

Critical Information in Research Proposal

In summary, the critical part in planning a research are:

1 Good summary of literature search represented as aconceptual framework

Anyone who is able to provide these four information in their research proposal should face only little problem when doing their research.

Data analysis is categorized into two main types: descriptive and analytical Descriptive statistics focus on summarizing individual variables, while analytical statistics aim to explore the relationships between two or more variables.

Keywords Descriptive Analytical Hypothesis testing Inferential statistics

Generally, the analysis consists of two parts, descriptive statistics and analytical statistics.

Descriptive Statistics

Describe Numerical Data

In conducting a demographic analysis of the study samples, it is essential to begin with an examination of the age distribution Utilizing SPSS for numerical data analysis allows for a comprehensive understanding of the demographic characteristics, providing valuable insights into the sample population.

4 Move age to Dependent List box

Fig 2.1 How to describe a variable?

A study conducted on 150 adults aimed to assess the prevalence of high blood pressure and identify associated factors Key variables analyzed included age, gender, monthly income (in RM), smoking status, body mass index (BMI), fasting blood glucose levels, and fasting total cholesterol levels.

The results here show that the mean (SD) is 37.6 (7.6) years, and median (IQR) is 37.5 (10) years, ranging from 18 to 56 years old Skewness and kurtosis are−0.254 and−0.198, respectively.

Screen 2.1 How to describe numerical variable using Explore

Output 2.1 Descriptive statistics of Age

Looking at the histogram above, we might not be able to appreciate its distribution What we can do is tofit a distribution curve on it.

SPSS Analysis: Fitting normal curve

1 At the output window, double click on the graph to open up Chart Editor

3 Close the Properties window (You can skip this step)

The distribution curve (Output 2.3) is a symmetrical bell-shaped, hence further strengthen our assumption that age is normally distributed Therefore, the age should be described using mean and standard deviation.

The analysis of glucose levels reveals a non-normal distribution, with a mean of 5.568 mmol/L and a median of 5.218 mmol/L The histogram indicates a left skewness (positive skewness of 1.62) and a kurtosis of 3.4 Consequently, it is more appropriate to use the median and interquartile range (IQR) to describe glucose levels rather than the mean and standard deviation.

Screen 2.2 How to add Normal curve

Describe Categorical Data

In analyzing categorical data, we utilize frequency counts In the study population, females represent 55.3% (n = 83), indicating that the remaining percentage corresponds to males, which can be inferred without direct mention.

Output 2.4 Descriptive statistics for Glucose

Output 2.5 Positively skewed distribution of

We can continue to describe other variables and the suggested presentation of this exercise is presented in Table2.1.

Screen 2.3 Describe Gender using Frequencies

Output 2.6 Descriptive statistics for Gender

Analytical Statistics

Concept in Causal Inference

The ultimate inference we would normally like to make iscausation.‘A’is causing

‘B’, such as smoking is causing lung cancer However, as mentioned in Chap 1,

Fig 2.3 How statistics estimates parameter

2.2 Analytical Statistics 43 there are usually many causes to one effect Not everyone who smoke will get lung cancer and not every lung cancer patient is a smoker.

Association exists on a spectrum, from detectable differences between variables to causation For instance, a study measuring high blood pressure prevalence among men in a village may reveal higher rates among smokers compared to non-smokers, indicating a difference in prevalence However, this observation alone does not establish that smoking causes high blood pressure, as there is no evidence showing that the men with high blood pressure began smoking prior to their diagnosis Moreover, such findings are limited to that specific instance and may not hold true across different populations or time periods.

Let us study the association of X and Y as depicted in Fig 2.4 There is a difference between observationsAandBwhich is atx 1 andx 2 , respectively.Bisy 2

At this stage, we are primarily focused on the difference between values, specifically noting that C is higher than B by y3 - y2 However, with additional information about Catx3, we can identify a trend: as the value of X increases, the value of Y also rises If we obtain another value of x beyond x3, we can even make predictions about its corresponding value.

The relationship between two variables can range from differences in their observations to the prediction of one variable based on another When this association is particularly strong, it may indicate causation Professor Hill proposed eight guidelines in 1965 to help establish causation.

2 Consistency of the observed association

4 Temporal relationship of the association

5 The presence of biological gradient or dose-response relationship

7 Possible to appeal to experimental evidence

Hypothesis Testing

We hypothesize that changes observed in samples may also reflect trends in the broader population, but findings in samples may not always apply to the entire population To validate our observations, it is essential to test whether the associations identified in samples are also present in the population For instance, consider the hypothesis that smokers have a higher prevalence of hypertension compared to non-smokers If the prevalence is 45% among smokers and 40% among non-smokers, we must determine if this 5% difference is representative of the population or merely a coincidence within our sample If it is a chance finding, the results will likely vary when different samples from the same population are analyzed.

Hypothesis testing follows the following steps:

State the Hypothesis

In hypothesis testing, it is essential to define both the null hypothesis (H₀) and the alternative hypothesis (H₁) The null hypothesis asserts that there is no significant difference, such as the prevalence of a condition being the same between smokers and non-smokers Conversely, the alternative hypothesis suggests that a difference exists, which can be either two-tailed—indicating any difference in prevalence—or one-tailed, specifying whether the prevalence of hypertension is higher among smokers or non-smokers Notably, one-tailed hypotheses require stronger evidence to reject the null hypothesis.

If the P 1 is the prevalence of hypertension among smokers and P 2 as the prevalence of hypertension among non-smokers, H o is the null hypothesis and H a as the alternative hypothesis:

Rejecting the null hypothesis is a more straightforward process than accepting the alternative hypothesis, as it aligns with the logic of falsification By successfully rejecting the null hypothesis, we move closer to accepting the alternative hypothesis For instance, proving that all swans are white poses a challenge, as it requires an infinite number of white swans to substantiate the claim In contrast, to refute the assertion that not all swans are white, it suffices to present just one black swan or any non-white swan.

Set a Criterion to Decide

In statistical analysis, the significance level is a crucial criterion, typically represented by the P value This value indicates the probability of committing a Type 1 error, which occurs when the null hypothesis (H0) is incorrectly rejected despite being true.

To achieve reliable statistical results, we aim for the smallest possible P value, also referred to as the alpha value Typically, a P value below 0.05 (or 5%) indicates a less than 5% probability of committing a Type 1 error, allowing us to confidently reject the null hypothesis (H0) When the P value is less than 0.05, it signifies stronger evidence against H0, whereas a P value greater than 0.05 means we cannot reject H0.

We use reject or not reject, instead of reject and accept the Ho This is very philosophical In court when someone is charged with any misconduct, the

‘hypothesis’is,“he is not guilty (until proven otherwise)” If the evidence is suf-

ficient (beyond reasonable doubt), the judge (or jury in some countries) will issue a guilty verdict which is actually rejecting the hypothesis of not guilty However, if Fig 2.5 Hypothesis testing

Fisher (1925) proposed a value as a convenient cut-off point to determine the significance of deviations, which has since become a standard criterion in practice In cases where evidence is insufficient, the hypothesis remains unchallenged, placing the burden of proof on the prosecutor to establish guilt If the prosecutor fails to do so, the presumption of innocence prevails, allowing the judge to declare a defendant not guilty, though this does not equate to a declaration of innocence.

Choosing Suitable Statistical Test

Next is to choose suitable statistical test to test the hypothesis The choice of test depends on:

2 Level of measurements for dependent and independent variable

3 Number of dependent and independent variable

4 Distribution of the numerical measures whether Normal or not

Table 2.2 How to choose statistical test

Numerical normal N/A One-sample t-test

Numerical not normal N/A Wilcoxon signed-rank test

Categorical Categorical χ 2 test or Fisher ’ s exact test Categorical 2 categories Numerical normal Independent sample t-test

Categorical 2 categories Numerical not normal Mann – Whitney U or log rank test Categorical > 2 categories Numerical normal Logistic regression *

Categorical > 2 categories Numerical not normal Logistic regression *

Numerical normal Numerical normal Pearson Correlation Coef fi cient test

Numerical normal Numerical not normal Spearman Correlation Coef fi cient test Numerical not normal Categorical Kruskall – Wallis test

Numerical not normal Numerical normal Spearman Correlation Coef fi cient test

Numerical not normal Numerical not normal Spearman Correlation Coef fi cient test Paired variables

Numerical normal Numerical normal Paired t-test

Numerical not normal Numerical not normal Wilcoxon signed-rank test

*If you wish just to test for signi fi cance difference, maybe you could categorise the variable and use χ 2 test instead

Numerous resources exist, including tables, guidelines, graphics, and algorithms, to assist in selecting the appropriate statistical test This book offers a straightforward guideline in Table 2.2, and readers are encouraged to explore additional materials for more in-depth understanding.

Making a Decision

The test yields a P value that must be compared to a predetermined cut-off point before data analysis If we set the cut-off at P < 0.05, a P value below this threshold indicates that we should reject the null hypothesis (H0), concluding that the observed difference is statistically significant.

In the subsequent chapters, step-by-step description on most of the bivariable analyses will be showed; and from Sect 2.6onwards, description and guide on multivariable analyses will follow.

Comparing Means

Compare One Mean

Using Data2.1above, if we wish to compare the mean BMI that was observed with values previously measured (24.37 kg/m 2 ) (Azmi et al 2009) we could use one-sample t-test.

SPSS Analysis: One-Sample T-test

4 Move‘bmi’to Test variable(s) box

5 Enter 24.37 (which was obtained from other study) into Test Value box

The analysis showed that BMI observed in this study (25.81 kg/m 2 ) is significantly higher (P < 0.001) 2 compared to the mean BMI from Azmi MY et al.

2 Never write P = 0.000 because it is meaningless It has a value but too small for SPSS to display it The correct way to describe it is by stating the P-value as P < 0.001.

Compare Two Means

Using Data 2.1, if we like to compare means of plasma glucose (in mmol/L) between male and female, we could use Independent sample t-test.

SPSS Analysis: Independent sample T-test

4 Move Glucose to Test Variable(s) box

5 Move gender to Grouping Variable box

Screen 2.4 Testing one mean to a known value using One-sample t-test

Output 2.7 Comparing one mean to a known value

7 Type 0 for Group 1 and 1 for Group 2 (because the code for gender is

The first table, titled Group Statistics, summarizes the mean glucose levels for males and females, revealing that females have a slightly higher mean glucose level at 5.6 mmol compared to 5.5 mmol for males Additionally, Levene’s Test for Equality of Variances is conducted to assess the variance between the two groups.

P = 0.065, 3 we can assume that variances are equal Therefore, observe the Screen 2.5 How to compare two means using Independent sample t-test

3 The H o for Levene ’ s Test is that there is no difference of variances between the groups If

The analysis revealed a P value of 0.508 from the t-test, indicating that we cannot reject the null hypothesis (H0) Consequently, we can assume that the variances are equal, and there is no significant difference in blood glucose means between males and females.

Compare More Than Two Means

When comparing more than two groups, it's essential to use Analysis of Variance (ANOVA) instead of multiple t-tests to reduce the risk of Type I errors In Data2.1, we analyze blood glucose levels (in mmol/L) across three BMI categories based on the WHO 2003 recommendations for Asians: Normal (BMI below 23 kg/m²), Overweight (BMI between 23 and 27.5 kg/m²), and Obese (BMI of 27.5 kg/m² and above) To facilitate this analysis, we utilize Visual Binning in SPSS to convert numerical data into categorical formats.

3 Move BMI to Variables to Bin

5 Type a name for the new variable (e.g bmistat)

7 For Row 1, enter 23 in the Value, then Normal as the Label; 27.5 and Overweight for Row 2 and leave HIGH in Row 3, type Obese for the Label.

8 Check Exclude in the Upper Endpoints

10 Confirm to create a new variable by clicking OK when prompted.

We will get a new variable (Screen 2.7) located at the end of the list We will now able to compare the means of glucose.

SPSS Analysis: One-way ANOVA

4 Move Glucose to Dependence List

5 Move BMI Status to Factor

6 Click Options Check Descriptive and Homogeneity of variance test Then click Continue.

Screen 2.6 Categorising numerical variable using Visual Binning

7 Click Post Hoc Based on the result of Homogeneity of variance test later, choose appropriate Post Hoc test 4 Click Continue.

The mean glucose levels for Normal and Overweight individuals were similar at 5.3 (1.4) mmol/L and 5.4 (1.3) mmol/L, respectively, while Obese subjects had a higher mean of 6.3 (1.3) mmol/L, with a significant difference noted (F(df = 2, 147) = 6.960, P = 0.01) This suggests at least one significant difference among the groups (Normal, Overweight, and Obese) Levene’s test confirmed equal variances (P > 0.05), allowing the use of post hoc tests under Equal Variances Assumed, specifically Scheffe’s test The results indicated significant differences between Normal–Obese (P = 0.030) and Overweight–Obese (P = 0.002).

Screen 2.7 A new variable is created after Visual Binning done

When comparing multiple means, ANOVA indicates significant differences only when at least two means vary significantly However, as an omnibus test, it does not specify which means differ To identify these differences, post hoc tests are necessary, although there is no universally superior post hoc test; any suitable test can be utilized.

Screen 2.8 Compare means using One-way ANOVA

Output 2.9 Descriptive statistics and ANOVA analysis

Output 2.10 Post hoc test result

Compare Paired Means

The independent sample t-test and one-way ANOVA are statistical tests used to compare independent means However, these tests are not suitable for dependent or paired means In Data2.2, a study involving 150 subjects who participated in a 6-month weight reduction program reveals that their post-program body weight is significantly influenced by their pre-intervention weight.

To compare the weight before and after the programme, we should use paired t-test 6

4 Move Body weight before to Variable 1 Pair 1 and Body weight after to Variable 2 Pair 1

The mean body weight reduced from 53.8 to 51.6 kg The difference of 2.2 (0.5) kg was significant (P < 0.001)

A study involving 150 participants assessed their blood pressure before and after a 6-month weight management program The participants' weight (in kg) and blood pressure readings were meticulously documented both prior to and following the intervention.

In the context of independent relationships, one variable's measurement is not influenced by another, such as body weight and sex However, when assessing the body weight of the same individuals before and after a specific health or lifestyle intervention, the post-intervention weight is influenced by the pre-intervention weight, indicating a dependent relationship in this scenario.

6 Here, we are comparing two dependent measurements If we wish to measure more than two measurements, we should use Repeated Measure ANOVA (Sect 2.7).

Comparing Proportions

Compare Independent Proportions

To test the differences in proportions of high blood pressure across various BMI categories, we can utilize the chi-square test Analyzing Data2.1 allows us to effectively evaluate these differences in proportion.

Screen 2.9 Comparing paired means using paired t-test

Output 2.11 Result from paired t-test

7 The symbol for chi-square is χ 2 ; not X 2

Screen 2.10 Comparing proportions using chi-square test

SPSS Analysis: Chi-square test

4 Move BMI Status to Rows(s)

5 Move Blood pressure to Column(s) 8

6 Click Statistics Check Chi-square Then click Continue

7 Click Cells Check Row Then click Continue

Output 2.12 Result of chi-square test

When organizing data, it's advisable to place the dependent variable in the Column section, as this facilitates the request for Row Percent in Step 7 This setup allows for a clear comparison of High Blood Pressure across Normal, Overweight, and Obese categories.

Screen 2.11 Comparing paired proportions using McNemar test

Output 2.13 Result for McNemar test

It was pretty obvious that the proportion of Obese group who had high blood pressure was higher (76.5 %) compared to both Normal and Overweight groups.The difference was significant (χ 2 (df = 2) = 10.654, P = 0.005) 9

Compare Paired Proportions

Chi-square is appropriate for comparing independent proportions, while McNemar's test should be utilized for dependent proportions, such as when evaluating blood pressure changes before and after a 6-month weight management program.

4 Move Blood pressure before into the Rows(s) box

5 Move Blood pressure after into the Column(s) box

From 78 subjects with high blood pressure initially, 29 (37 %) showed lower blood pressure status after the programme (P < 0.001).

Comparing Ranks

Covariance, Correlation and Regression

General Linear Model

Ngày đăng: 08/09/2021, 08:48

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN