Specifying Statistical Quantities
Notes from ISO 11404
A datatype consists of 3 constructs –
- The value space – the set of values the type describes
- Properties – the set of axioms the values follow
- Characterizing operations – the distinguishing operations (from other datatypes) that are allowed
The properties have the following features –
Equality
- Order
- Bound
- Cardinality
- Exact and Approximate
- Numeric
Every datatype has the equality property, as this is needed for data to be copied from disk to register and back, or copied from one paper to another.
Order has to do with whether the values are ordered and what type of order (total or partial) that might be.
Bound has to do with whether there are bounds on the set of values, such as the set of all non-negative integers is bounded below by zero.
Cardinality determines the number of values in the value space, and this could include countably or uncountably infinite. [[a set is countably infinite if its elements can be put into one to one correspondence with the natural numbers; e.g., natural numbers, integers, rational numbers!!, polynomials!!, but not the real numbers]].
Exact and approximate is used to declare whether the values represent their meaning exactly. E.g., floating point numbers are not always exact.
For the statistical datatypes, here are the properties that apply –
- Nominal: equality
- Ordinal: nominal plus ordered
- Interval: ordinal plus numeric, as numeric implies intervals are meaningful, i.e., for all x, y, z, then x-y=(x+z)-(y+z)
- Ratio: There aren’t additional properties we can supply to account for the need of a true zero
However, the characterizing operations add the appropriate additional conditions. The field operations for rationals and reals mean that zero is the additive identity, and this is sufficient.
Units needed for Statistical Quantities:
- Finite population – a population for which it is possible to count its units
- Mean – the average of a set of numbers
- Total – sum of the values for some characteristic of all units
- Index – the change in some aggregate relative to the value of the aggregate at a reference period
- Ratio – result of dividing one measure by another
Statistical datatype families
- Nominal – unordered named categories
- Ordinal – nominal categories that are ordered
- Interval – quantitative data where differences between values are meaningful
- Ratio – interval data where a value of zero means absence of the quantity measured
Unit of measure – a definite magnitude, established by convention, and used as a standard for measurement (modified from Wikipedia)
Additional Definitions from Statistics Canada:
Standard Deviation and Variance – see http://f3apache1/tpv2alpha/alpha-eng.html?lang=eng
Definition for Sampling Variance – see section 3.4.1 of the following Statistics Canada publication: http://www.statcan.gc.ca/pub/12-587-x/12-587-x2003001-eng.pdf
Definition of Index and other terms – http://www.statcan.gc.ca/edu/power-pouvoir/glossary-glossaire/5214842-eng.htm#p
See 2017-03-10 Meeting notes for more definitions of some of these concepts.
XML Schema Datatypes for more on how to create definitions: https://www.w3.org/TR/xmlschema-2/