
Sample size estimation is a common problem when designing a study. The pages in this file will help you with studies involving comparison of means or proportions. It is important to have an idea about what constitutes a clinically important difference in outcomes. It is also important to have some idea about the variability you are likely to observe in the data you will eventually collect. (This most often is obtained from previous studies like yours.) Statistical consultation should always be obtained about this before you begin collecting data.
Frequency distributions are one of the most important tools used in the analysis of experimental data. Many statistical tests are carried out by choosing a frequency distribution that closely matches the distribution of your observations. The mathematical properties of the matched frequency distribution are then used to calculate the probability of observing data like yours just by chance. Frequency distributions are also used to estimate confidence intervals. Choosing the most appropriate distribution to use to represent your data often involves getting expert help from an experienced statistician.
The frequency distributions most commonly used in biostatistics include Student’s t, normal, the chi-square, binomial, gamma, and the F distributions. Many others are available to model certain types of observations. When a frequency distribution is scaled so that the total area under its curve equals one, it is called a "probability density function." These curves are mathematically complicated and their values are usually obtained from a table or calculated with a computer.
Each distribution has one or more parameters that are use to set its center, shape, degree of asymmetry, and other properties. For example, the parameters of the standard normal distribution are the mean and the standard deviation. These two parameters uniquely specify its center and shape.
This spreadsheet demonstrates several commonly used distributions that are used in biostatistics. The parameters are adjustable with ‘scrollbars’, and the graphs of the distributions are drawn so that you can get an idea about the effect of different combinations of parameters. You can also use the spreadsheet in place of a table of the distribution. The values generated are accurate to about 10 decimal places.
Statistical Frequency Distributions
Comparing the means or proportions from more than two groups means that there are more than one possible two-way comparisons. As the number of groups increases, the number of possible two-way comparisons increases rapidly. Adjustments in your statistical procedures should take this fact into account to avoid underestimation of your experiment-wise Type II error rate. Statistical consultation should always be obtained to avoid making this type of mistake in the statistical interpretation of your data.