Wednesday, January 21, 2009

A Note on Statistics

In a recent post, I talked about tracking all the books I read. For the past several years I have done a simple word processing document and manually added up the pages. In the past couple of years I have become (in my opinion) quite skillful at extracting data using a spreadsheet. So, recently, as I was updating my "Books I've Read - 2009" document I realized that if I could create a workbook in Microsoft Excel with 20+ worksheets that proposes housing rates, meal plan rates, and is fully customizable down to the penny (including calculating how much institutional aid students are awarded) I could surely find a better way of collecting and tracking page data in any given year.

At the end of the year, with the spreadsheet I have designed, I will be able to tell on any given day approximately how many pages I read, how long it took me to read any particular book, whether or not I read a particular genre more quickly than others, and have a graph displaying pages read over time. Sound cool? Well, I think so; and, it gets even better than that. Over the course of several years I will be able to cross compare years and display a graph of all these years overlapped, and will be able to see trends. Do I read more in the summer? Do I read more at the end of the year? Of course, I already have a pretty good idea of these answers, but is that not what statistics do best? Manipulate data to show what we already knew?

I remember my statistics class my Sophomore year of college. It was an introductory course; and, as a side note, my spreadsheet skills did not start in this class but rather an introductory physics class called Physical Computations, or something like that.

Anyways, if you really truly think about it, statistics is truly a powerful tool. If you strip away all the political connotations and stop skewing data, statistics might be able to tell us everything. For example, as we plot data points on a graph we can see trends. Never do all the dots fall directly on the "line of best fit;" it is my conjecture that this is simply because we have not yet described the data points accurately enough. So, if we add another dimension to our graph (now making it a three dimensional graph), we have more data to look at per data point. This point is more accurately described.

Of course, we still have outliers and data points that do not fall on the line of best fit. Now I'm sure we all are thinking about those introductory statistics class and how could this accurately predict anything (let alone everything); but, if you think about the things we are comparing, how could you predict anything. For example, in a two dimensional comparison, pages over time. Just tracking pages and time is not enough. There is no way to tell how many pages I will read in any given day just based on how many pages I've read in the past. However, if we start describing these points better, we come a lot closer to being able to predict. How about adding a couple more "descriptions" like my mood after work, other engagements I may have, how satisfied I am with my current book, and even how much oxygen is in my lungs at the time (since increased oxygen may result in higher brain stimulation and a higher reading volume).

Do you see where I am going here? Give me a graph with an infinite number of dimensions and I could predict anything, at any given time.

So, statistics is quite a powerful tool, and not just for proposing budgets, proposing annual residence hall rates, and tracking books read. On the other hand, statistics also provides some fun bar tricks (if you are lucky). If you are in a group of 25 people, there is a 50% chance that two of them will have the same birthday. Sound illogical? It's not. I won a free beer once with this.

Good reading (and analyzing),

Plants and Books


  1. But yet, the more information we have from statistics to predict and forecast... does that imply that we subconsciously do what we predict? For example, if my data I collect tells me that I am a slower ready of Fantasy books, then will I just naturally read slower because that is what my data tells me how I typically preform? In my opionion, the key to statistics is what we do with that information... not to just have information for the sake of having it; but how we can use it to change and improve.

    PS -- I also am a nerd and enjoy creating graphs and trends in excel to see what my information is telling me. :)

  2. Knowledge is just one of an infinite number of factors on why we choose to do what we do. It is not the sole decider on, say pages/subject per day.

  3. If only everyone appreciated and understood that concept...


Clicky Web Analytics