Statistics is basically a science that involves data collection, data interpretation and finally,
data validation. Statistical data analysis is a procedure of performing various statistical operations.
It is a kind of quantitative research, which seeks to quantify the data, and typically, applies some
form of statistical analysis. Quantitative data basically involves descriptive data, such as survey
data and observational data.

Statistical data analysis generally involves some form of statistical tools, which a layman cannot
perform without having any statistical knowledge. There are various software packages to perform
statistical data analysis. This software includes Statistical Analysis System (SAS), Statistical
Package for the Social Sciences (SPSS), Stat soft, etc.

Data in statistical data analysis consists of variable(s). Sometimes the data is univariate or
multivariate. Depending upon the number of variables, the researcher performs different statistical
techniques.

If the data in statistical data analysis is multiple in numbers, then several multivariates can be
performed. These are factor statistical data analysis, discriminant statistical data analysis, etc.
Similarly, if the data is singular in number, then the univariate statistical data analysis is
performed. This includes t test for significance, z test, f test, ANOVA one way, etc.

The data in statistical data analysis is basically of 2 types, namely, continuous data and discreet
data. The continuous data is the one that cannot be counted. For example, intensity of a light can
be measured but cannot be counted. The discreet data is the one that can be counted. For example, the
number of bulbs can be counted.

**Mean**

The mean or average of a data set is the basic and the most common statistics used. The mean is
calculated by adding all data points and dividing the sum by the number of samples.
**Standard deviation**

The standard deviation of a data set is the measure of the spread of the values in the sample set and
is computed by measuring the difference between the mean and the individual values in a set.
** Regression analysis**

Regression analysis is used to evaluate a linear relationship between test results. A linear
relationship is, in general, evaluated over the range of the analytical procedure. The data obtained
from analysis of the solutions prepared at a range of different concentration levels is habitually
investigated by plotting on a graph.
** The hypothesis tests**

The hypothesis tests are intended to verify if the experimental data are consistent with certain
theoretical hypothesis.

The null hypothesis symbolized by H0 considers that the two elements or series of elements are
equal.

The second step consists in measuring the deviation between different characteristics.

The third step is to calculate the probability P to have this deviation if H0 is true.

The fourth step is to draw conclusions that are required:

If P is large, we admit that H0 is plausible, on the other side if P is small, the deviation is
incompatible with H0. The value limit of P that is fixed to determine if P is large or small is the
level of confidence or significance level (usually we chose P = 0.95 as level of confidence
(α = 0.05 as significance level)).

Four situations are possible:

Acceptance of H0 true.

Rejecting true H0: first error species (α).

Acceptance false H0: second error species (β).

Rejecting false H0.

** Other statistical tools**

Other statistical tools used in method validation include comparative studies using Student's
t-test, Fisher's test, analysis of variation (ANVA), design of experiments, and assessment of outliers.
Information on these statistical tools can be obtained from references on statistics suggested in the
reference section.

**Specificity/selectivity**

Specificity is a quantitative indication of the extent to which a method can distinguish between the
analyte of interest and interfering substances on the basis of signals produced under actual
experimental conditions. Random interferences should be determined using representative blank samples.

**Accuracy**

Accuracy refers to closeness of agreement between the true value of the analyte concentration and the
mean result obtained by applying experimental procedure to a large number of homogeneous samples.
It is related to systematic error and analyte recovery. Systematic errors can be established by the
use of appropriate certified reference materials (matrix-matched) or by applying alternative analytical
techniques.

**Precision**

Comparison of results obtained from samples prepared to test the following conditions:

Repeatability expresses the precision under the same operating conditions over a short interval of
time. Repeatability is also termed intra-assay precision.

Intermediate precision expresses within-laboratories variations: different days, different analysts,
different equipments, etc.

Reproducibility expresses the precision between laboratories (collaborative studies, usually applied to
standardization of methodology).