Fair Access to Publicly Funded Research Results

The EFF {Electronic Frontier Foundation} has an initiative to get public access to publically funded research.
I have mixed feeling on this issue.
On the one hand we as taxpayers have paid for this research to be funded and should have a right to read the research. We shouldn’t have to subscribe to extremely expensive research data stores that charge for access to the research, like we currently do. Not having to wait for the peer review journal process to publish research will mean faster and more current research availability. On that I agree.

BUT, much government funded research produces garbage because it was poorly executed, conceived or done with an agenda that is contrary to public good and truthfulness. Many people won’t be able to grasp how the data was analyzed as rarely is this disclosed fully in the research. The best system we currently have is the Academic or Peer Journal review process. This Peer review/rewrite process can be a multi-year process where the research results value diminishes on time critical research. But the value of Peers reviewing the research can save many from acting on research results of a bad piece of research.

Refer to the economists Carmen Reinhart and Kenneth Rogoff article about “austerity” …”has shaped political decisions over the best way to deal with foundering economies.” Many governments have based financial strategies on this NON-Peer reviewed paper and less than 3 years after publication a Graduate student at Univ.of Mass. found math errors and omissions that some have said “the Reinhart/Rogoff claim was ideology, not social science.” [ibid]

Care should be taken when reviewing Research and Statistical Analysis. NEVER look at research as if it is a sound bite by a politician.
Still I think the public may have challenged the research quicker if there was government laws requiring disclosure.
To be fair to Carmen Reinhart and Kenneth Rogoff, they not only released their research to the doctoral student Thomas Herndon, he said on The Colbert Report talk show recently they also sent him the original spreadsheets that they had used in the calculations and it was on page one near the top where Thomas Herndon found the math error. Proof of the value of Peer Review.

I include below the text of the message that is sent to your representaives in Washington.:
As your constituent, I am urging you to support the Fair Access to Science & Technology Research Act (FASTR is S. 350 in the
Senate and H.R. 708 in the House).

Government agencies like the National Science Foundation invest millions of taxpayer dollars into scientific research every year, but the resulting research is locked up in expensive journals. As a result students and citizens have difficulty accessing information they need; professors have a harder time reviewing and teaching the state of the art; and cutting-edge research remains hidden.

FASTR fixes this. The bill makes government agencies design and implement a plan to facilitate public access to the results of their investments. Any researcher who receives federal funding must submit a copy of resulting journal articles to the funding agency, which will then make that research widely available within six months.

Please secure our rights as taxpayers and promote the progress of science by supporting FASTR.

If YOU would like to let your Representative in Washington D.C. know you support passage of this Bill you can have a email sent to your representatives by going to this web address. Tell my representative I want Publically Funded Research Available to the Public. Just enter in your zip codein the box on the right side, it will look up who your representatives are and send to them.

For California there is additional Initiative you can Let your representatives know by clicking Here to let california representatives Know I want state funded research to be available to the public

SAS

I came into the SAS world not at a Business nor during my college studies.
I was exposed to SAS because it was the tool PhD students I was helping with database issues needed the databases to work with: SAS for statistical analysis.
So the first thing I learned was Importing and Exporting datasets.
I crash coursed variables and functions and then decided I would attend the SAS institute workshops and get SAS base programmer Certification.
I recommend you have access to a working copy of SAS to practice on and be familiar with SAS prior to going through the workshops, it will be much less stressful.

As a learning tool SAS has a version of their Enterprise Guide program available for license for around $200 per year. It’s biggest limitations currently are limited ability to work with Microsoft Excel worksheets and files and the inability to utilize your own datasets. SAS, the company, obviously want you to purchase their full commercial products to do your own data analyses. They sell annual usage licenses based on what functional modules you need and each module are typically $2,000 to $10,000 each per year. That is cost prohibitive for most students.

Luckilly, most students have a supportive Professor that will allow them to use a license for research purposes, but that’s not guaranteed.

SAS has released some less expensive products since 2012 and

If you are studying for the Certification exams follow this link.
If you are wondering how to code your own statistical analysis follow this link.
I will try to tie together the statistics theory and the code snippets to help you get the job done.

 

Regressions: Linear and Multi-Variant {GASP}

I don’t think the person exists that doesn’t take a step backwards when they first hear they need to perform regressions especially the dreaded Multi-Variant Linear Regression.
Let me help everyone that is frightened by this….It looks scary, the formulas look like some secret spy code, the good news is most of us have done a similar analysis we just didn’t write the regression model out.
Simply put Regressions put the “variable of interest” (the dependent variable) on one side of the equation and the variables that we believe contribute or explain that variable of interest on the other side of the equation.
That explanation even sounds scary to me even with as simplified as it is.
The example used in many text books is the Salary analysis.

Salary = years of experience + years of education + average salary in a field

When it is put into the standard form we simplify variable names and add some requisite pieces to the equation that I’ll explain in a minute.

Variables
S – Salary (the dependent variable from dataset)
E – years of Experience (Independent variable from dataset)
D – years of Education (Independent variable from dataset)
F – average Salary in Field (Independent variable from dataset)
α – the Y-axis intercept
β – variables coefficient, one for each variable so subscripts are assigned to them. β1, β2, β3…
μ – error term component, this effectively is a stand in variable for all the possible variables we do not know of or do not have data for.

S = α + β1E + β2D + β3F + μ

The 3 parts you can’t do without. α, β, μ .
α : the Y-axis intercept
β –Coefficient of the variable for how much the variable “explains” the “variable of Interest”. While it may look like a simple “how much does this variable contribute to the ‘variable of interest’ like 45%” This IS NOT what the Coefficients function is in the regression.
μ – error term component, sometimes noted as U (unknown), e or ε (error).
The goal of a regression is to test how well the variables we Hypothosize explain the Dependent Variable of interest. It would be great if our Hypothesized equation (Model) explained 100% that is entirely explains the Variable of interest. Well let me tell you that won’t happen. The μ error term helps get us closer but we still won’t get all the way there.

It is not uncommon for one of the independent variables to be found to not explain the dependent variable at all, at which point a new model omitting that variable is in order. Unless we want to support the hypothesis that that variable has nothing to do with the dependent variable.
If we go back to the above Model and we found D (years of Education) did not explain Salary, we may be incline to re-specify the model without variable D. Personally, I would be re-examining the datasets for issues because logically we probably all believe years of education directly effects Salary, why else would we have studied stats.

Probability And Permutations

One of the first areas that everyone is introduced to in the study of statistics are the concepts of Probability and Permutations.
Probability is usually referred to as odds by most people and the coin toss is the introduction to the concept.
A coin is generally considered to have two sides: One Heads One Tails. So is the coin is tossed into the air and it lands with one of those sides facing up what are the “odds” or probability of that being the side called heads?
possible outcomes is 2, either Heads or Tails.
Tossing the coin ONCE means only one of the possibilities will occur and they both are just as likely to occur.