Modules

=Modules=

Module Drafts
This is a pompous title, since these are very rough and preliminary drafts (essentially, written almost as "stream of consciousness", at least in their first draft). Still, it's a way of keeping any work public, and open to suggestions, reprimands, and what not. (updated 10/30)
 * A **short** and quick introduction to what our stats course should be about (in progress) [[file:intro.pdf]] (updated 10/30)
 * We introduce some common //descriptive// methods by considering situations where we are able to observe //the whole population//, not just a sample.
 * A **short** and quick introduction to probability models, and how they can guide us in looking at data (even when we are still in "descriptive mode") (also very much in progress)[[file:module2.pdf]] (updated 10/30)

Module Plan
The structure of the course could be as follows (there are changes from the structure implied in the first draft)


 * 1) **Introduction**
 * 2) **Complete Data Sets:** descriptive statistics applied to //samples// requires some probabilistic model to be conducive to significant statements, since it can only //describe// the data from the sample, and nothing more. It makes sense to introduce the most common tools when considering //complete observations//, that is data regarding a relatively small and **well defined** population, that needs to be summarized. This is not a //Census//, which involves a very large, and not completely well-defined population (people are born, and die, immigrate, and emigrate every second), which is a fascinating topic for statistics (how to address the innumerable errors that are inevitable in such a large endeavor) that is completely outside the scope of this course.
 * 3) **A Primer on Probability:** a simplified, but precise introduction to the main probabilistic tools we will employ. Emphasis should be on the concept of //Random Variable//, as that is what we will be concerned with.
 * 4) **Inferential Statistics:** this would be the core of the course. Thanks to powerful software support, the emphasis should be on the concepts, the scope and the limitations of the tools, rather than the technicalities of calculations. This would be broken down into:
 * 5) **Interval Estimation:** Point estimation may be hinted at, but it is a topic prone to naive use, as a more sophisticated discussion, as in considering the efficiency of an estimator, is outside the scope of an introductory course. Interval estimation, on the other hand, as applied to standard problems, like estimating the mean of a Gaussian Random Variable, allows for a simple introduction to its scope and logic. If time allows, examples cold be provided for other cases, such as estimating the variance of a Gaussian Random Variable, the mean of an Exponential Random Variable, and so on.
 * 6) **Statistical Tests:** We obviously will have to stick to parametric testing (though a generic reference to non parametric tests could provide some support when discussing //alternative hypotheses//, and the //power of a test//). We will naturally dovetail to the examples discussed in the //Interval Estimation// module. However, a discussion of the power of a test, within a simple setting, such as testing for the mean of a Gaussian variable, is important, as it illuminates the meaning of the result of a test. Most courses at this level include some //Analysis of Variance//, and we will include that too.
 * 7) **Linear Models:** Most courses at this level also include a brief introduction to //Linear Models//. As a topic in multivariate statistics, germane to //Point Estimation//, it is somewhat more delicate than the previous material, but it is certainly of such widespread use that it is worth including. At this level the justification for Least Mean Square methods are going to be somewhat intuitive, but there will an effort to clarify the implied assumptions in the method.

Module 1: Complete Data Sets
The main point of this module is the introduction of descriptive statistics. There is an efficiency problem here. It does not seem worthwhile reinventing the wheel by producing a list of descriptive methods from scratch, when such lists are widely available form many sources. Copyright and license issue prevent the inclusion of most of these sources in the course, but external references, in particular to some excellent open documents whose license prevents their inclusion here should be enough to cover the enumeration issue. Besides, some of the standard tools are somewhat quaint in the age of the graphical computer, as they were developed in and were motivated by the age of the typewriter. The plan is to concentrate of the tools that are easily accessible from the standard //Gnumeric// menu, and that are, indeed, the most commonly used.

About External Sources
Copyright and licensing issues prevent the direct inclusion of commercial, but also non commercial sources whose license is not compatible with the CC BY license we are working under. Still, some are very good, and external references will allow students to take advantage of them, even if this course, and its future editors, will not be able to manipulate them for better integration. Compatible documents will be used, of course, with due attribution.