What is Statistical Data Analysis?

Statistics is basically a science that involves data collection, data interpretation and finally, data validation. Statistical data analysis is a procedure of performing various statistical operations. It is a kind of quantitative research, which seeks to quantify the data, and typically, applies some form of statistical analysis. Quantitative data basically involves descriptive data, such as survey data and observational data.

Statistical data analysis generally involves some form of statistical tools, which a layman cannot perform without having any statistical knowledge. There are various software packages to perform statistical data analysis. This software includes Statistical Analysis System (SAS), Statistical Package for the Social Sciences (SPSS), Stat soft, etc.

Data in statistical data analysis consists of variable(s). Sometimes the data is univariate or multivariate. Depending upon the number of variables, the researcher performs different statistical techniques.

If the data in statistical data analysis is multiple in numbers, then several multivariates can be performed. These are factor statistical data analysis, discriminant statistical data analysis, etc. Similarly, if the data is singular in number, then the univariate statistical data analysis is performed. This includes t test for significance, z test, f test, ANOVA one way, etc.

The data in statistical data analysis is basically of 2 types, namely, continuous data and discreet data. The continuous data is the one that cannot be counted. For example, intensity of a light can be measured but cannot be counted. The discreet data is the one that can be counted. For example, the number of bulbs can be counted.

Statistical tools in analytical method validation

The mean or average of a data set is the basic and the most common statistics used. The mean is calculated by adding all data points and dividing the sum by the number of samples.
Standard deviation
The standard deviation of a data set is the measure of the spread of the values in the sample set and is computed by measuring the difference between the mean and the individual values in a set.
Regression analysis
Regression analysis is used to evaluate a linear relationship between test results. A linear relationship is, in general, evaluated over the range of the analytical procedure. The data obtained from analysis of the solutions prepared at a range of different concentration levels is habitually investigated by plotting on a graph.
The hypothesis tests
The hypothesis tests are intended to verify if the experimental data are consistent with certain theoretical hypothesis.

The null hypothesis symbolized by H0 considers that the two elements or series of elements are equal.

The second step consists in measuring the deviation between different characteristics.

The third step is to calculate the probability P to have this deviation if H0 is true.

The fourth step is to draw conclusions that are required:

If P is large, we admit that H0 is plausible, on the other side if P is small, the deviation is incompatible with H0. The value limit of P that is fixed to determine if P is large or small is the level of confidence or significance level (usually we chose P = 0.95 as level of confidence (α = 0.05 as significance level)).

Four situations are possible:

Acceptance of H0 true.

Rejecting true H0: first error species (α).

Acceptance false H0: second error species (β).

Rejecting false H0.

Other statistical tools

Other statistical tools used in method validation include comparative studies using Student's t-test, Fisher's test, analysis of variation (ANVA), design of experiments, and assessment of outliers. Information on these statistical tools can be obtained from references on statistics suggested in the reference section.

Validation characteristics

Specificity is a quantitative indication of the extent to which a method can distinguish between the analyte of interest and interfering substances on the basis of signals produced under actual experimental conditions. Random interferences should be determined using representative blank samples.

Accuracy refers to closeness of agreement between the true value of the analyte concentration and the mean result obtained by applying experimental procedure to a large number of homogeneous samples. It is related to systematic error and analyte recovery. Systematic errors can be established by the use of appropriate certified reference materials (matrix-matched) or by applying alternative analytical techniques.

Comparison of results obtained from samples prepared to test the following conditions:

Repeatability expresses the precision under the same operating conditions over a short interval of time. Repeatability is also termed intra-assay precision.

Intermediate precision expresses within-laboratories variations: different days, different analysts, different equipments, etc.

Reproducibility expresses the precision between laboratories (collaborative studies, usually applied to standardization of methodology).