Thursday, 2 February 2012

REFLECTION:
 
            It shows that having the assessment tools is really easy in the sense that you should consider everything in constructing or making this. You should look and look about having your assessment tools because you need to consider the advantages, disadvantages and the checklist in having this, besides this module also helping us in having assessment tools through giving suggestions and pointers in making particular assessment tools. On the other hand this module make us realize that we should be sincere in having assessment tools because you have lots of matter to be consider a  part of learning assessment.


Reliability & Validity

We often think of reliability and validity as separate ideas but, in fact, they're related to each other. Here, I want to show you two ways you can think about their relationship.
One of my favorite metaphors for the relationship between reliability is that of the target. Think of the center of the target as the concept that you are trying to measure. Imagine that for each person you are measuring, you are taking a shot at the target. If you measure the concept perfectly for a person, you are hitting the center of the target. If you don't, you are missing the center. The more you are off for that person, the further you are from the center.
The figure above shows four possible situations. In the first one, you are hitting the target consistently, but you are missing the center of the target. That is, you are consistently and systematically measuring the wrong value for all respondents. This measure is reliable, but no valid (that is, it's consistent but wrong). The second, shows hits that are randomly spread across the target. You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals). In this case, you get a valid group estimate, but you are inconsistent. Here, you can clearly see that reliability is directly related to the variability of your measure. The third scenario shows a case where your hits are spread across the target and you are consistently missing the center. Your measure in this case is neither reliable nor valid. Finally, we see the "Robin Hood" scenario -- you consistently hit the center of the target. Your measure is both reliable and valid (I bet you never thought of Robin Hood in those terms before).
Another way we can think about the relationship between reliability and validity is shown in the figure below. Here, we set up a 2x2 table. The columns of the table indicate whether you are trying to measure the same or different concepts. The rows show whether you are using the same or different methods of measurement. Imagine that we have two concepts we would like to measure, student verbal and math ability. Furthermore, imagine that we can measure each of these in two ways. First, we can use a written, paper-and-pencil exam (very much like the SAT or GRE exams). Second, we can ask the student's classroom teacher to give us a rating of the student's ability based on their own classroom observation.
The first cell on the upper left shows the comparison of the verbal written test score with the verbal written test score. But how can we compare the same measure with itself? We could do this by estimating the reliability of the written test through a test-retest correlation, parallel forms, or an internal consistency measure (See Types of Reliability). What we are estimating in this cell is the reliability of the measure.
The cell on the lower left shows a comparison of the verbal written measure with the verbal teacher observation rating. Because we are trying to measure the same concept, we are looking at convergent validity (See Measurement Validity Types).
The cell on the upper right shows the comparison of the verbal written exam with the math written exam. Here, we are comparing two different concepts (verbal versus math) and so we would expect the relationship to be lower than a comparison of the same concept with itself (e.g., verbal versus verbal or math versus math). Thus, we are trying to discriminate between two concepts and we would consider this discriminant validity.
Finally, we have the cell on the lower right. Here, we are comparing the verbal written exam with the math teacher observation rating. Like the cell on the upper right, we are also trying to compare two different concepts (verbal versus math) and so this is a discriminant validity estimate. But here, we are also trying to compare two different methods of measurement (written exam versus teacher observation rating). So, we'll call this very discriminant to indicate that we would expect the relationship in this cell to be even lower than in the one above it.
The four cells incorporate the different values that we examine in the multitrait-multimethod approach to estimating construct validity.
When we look at reliability and validity in this way, we see that, rather than being distinct, they actually form a continuum. On one end is the situation where the concepts and methods of measurement are the same (reliability) and on the other is the situation where concepts and methods of measurement are different (very discriminant validity). 

Reflection:

Item Analysis


After you create your objective assessment items and give your test, how can you be sure that the items are appropriate -- not too difficult and not too easy? How will you know if the test effectively differentiates between students who do well on the overall test and those who do not? An item analysis is a valuable, yet relatively easy, procedure that teachers can use to answer both of these questions.
To determine the difficulty level of test items, a measure called the Difficulty Index is used. This measure asks teachers to calculate the proportion of students who answered the test item accurately. By looking at each alternative (for multiple choice), we can also find out if there are answer choices that should be replaced. For example, let's say you gave a multiple choice quiz and there were four answer choices (A, B, C, and D). The following table illustrates how many students selected each answer choice for Question #1 and #2.
Question
A
B
C
D
#1
0
3
24*
3
#2
12*
13
3
2
* Denotes correct answer.
For Question #1, we can see that A was not a very good distractor -- no one selected that answer. We can also compute the difficulty of the item by dividing the number of students who choose the correct answer (24) by the number of total students (30). Using this formula, the difficulty of Question #1 (referred to as p) is equal to 24/30 or .80. A rough "rule-of-thumb" is that if the item difficulty is more than .75, it is an easy item; if the difficulty is below .25, it is a difficult item. Given these parameters, this item could be regarded moderately easy -- lots (80%) of students got it correct. In contrast, Question #2 is much more difficult (12/30 = .40). In fact, on Question #2, more students selected an incorrect answer (B) than selected the correct answer (A). This item should be carefully analyzed to ensure that B is an appropriate distractor.

Another measure, the Discrimination Index, refers to how well an assessment differentiates between high and low scorers. In other words, you should be able to expect that the high-performing students would select the correct answer for each question more often than the low-performing students.  If this is true, then the assessment is said to have a positive discrimination index (between 0 and 1) -- indicating that students who received a high total score chose the correct answer for a specific item more often than the students who had a lower overall score. If, however, you find that more of the low-performing students got a specific item correct, then the item has a negative discrimination index (between -1 and 0). Let's look at an example.

Table 2 displays the results of ten questions on a quiz. Note that the students are arranged with the top overall scorers at the top of the table.
Student
Total
Score (%)
Questions
1
2
3
Asif
90
1
0
1
Sam
90
1
0
1
Jill
80
0
0
1
Charlie
80
1
0
1
Sonya
70
1
0
1
Ruben
60
1
0
0
Clay
60
1
0
1
Kelley
50
1
1
0
Justin
50
1
1
0
Tonya
40
0
1
0





"1" indicates the answer was correct; "0" indicates it was incorrect.
Follow these steps to determine the Difficulty Index and the Discrimination Index.
  1. After the students are arranged with the highest overall scores at the top, count the number of students in the upper and lower group who got each item correct. For Question #1, there were 4 students in the top half who got it correct, and 4 students in the bottom half.
  2. Determine the Difficulty Index by dividing the number who got it correct by the total number of students. For Question #1, this would be 8/10 or p=.80.
  3. Determine the Discrimination Index by subtracting the number of students in the lower group who got the item correct from the number of students in the upper group who got the item correct.  Then, divide by the number of students in each group (in this case, there are five in each group). For Question #1, that means you would subtract 4 from 4, and divide by 5, which results in a Discrimination Index of  0.
  4. The answers for Questions 1-3 are provided in Table 2.
Item
# Correct (Upper group)
# Correct (Lower group)
Difficulty (p)
Discrimination (D)
Question 1
4
4
.80
0
Question 2
0
3
.30
-0.6
Question 3
5
1
.60
0.8
Now that we have the table filled in, what does it mean? We can see that Question #2 had a difficulty index of .30 (meaning it was quite difficult), and it also had a negative discrimination index of -0.6 (meaning that the low-performing students were more likely to get this item correct).  This question should be carefully analyzed, and probably deleted or changed. Our "best" overall question is Question 3, which had a moderate difficulty level (.60), and discriminated extremely well (0.8).

Another consideration for an item analysis is the cognitive level that is being assessed.  For example, you might categorize the questions based on Bloom's taxonomy (perhaps grouping questions that address Level I and those that address Level II). In this manner, you would be able to determine if the difficulty index and discrimination index of those groups of questions are appropriate. For example, you might note that the majority of the questions that demand higher levels of thinking skills are too difficult or do not discriminate well.  You could then concentrate on improving those questions and focus your instructional strategies on higher-level skills.

Module 7:What is a Statistic?

In the mind of a statistician, the world consists of populations and samples. An example of a population is all 7th graders in the United States. A related example of a sample would be a group of 7th graders in the United States. In this particular example, a federal health care administrator would like to know the average weight of 7th graders and how that compares to other countries. Unfortunately, it is too expensive to measure the weight of every 7th grader in the United States. Instead statistical methodologies can be used to estimate the average weight of 7th graders in the United States by measure the weights of a sample (or multiple samples) of 7th graders.
Parameters are to populations as statistics are to samples.
A parameter is a property of a population. As illustrated in the example above, most of the time it is infeasible to directly measure a population parameter. Instead a sample must be taken and statistic for the sample is calculated. This statistic can be used to estimate the population parameter. (A branch of statistics know as Inferential Statistics involves using samples to infer information about a populations.) In the example about the population parameter is the average weight of all 7th graders in the United States and the sample statistic is the average weight of a group of 7th graders.
A large number of statistical inference techniques require samples to be a single random sample and independently gathers. In short, this allows statistics to be treated as random variables. A in-depth discussion of these consequences is beyond the scope of this text. It is also important to note that statistics can be flawed due to large variance, bias, inconsistency and other errors that may arise during sampling. Whenever performing over reviewing statistical analysis, a skeptical eye is always valuable.
Statistics take on many forms. Examples of statistics can be seen below.

Basic Statistics

When performing statistical analysis on a set of data, the mean, median, mode, and standard deviation are all helpful values to calculate. The mean, median and mode are all estimates of where the "middle" of a set of data is. These values are useful when creating groups or bins to organize larger sets of data. The standard deviation is the average distance between the actual data and the mean.

Mean and Weighted Average

The mean (also know as average), is obtained by dividing the sum of observed values by the number of observations, n. Although data points fall above, below, or on the mean, it can be considered a good estimate for predicting subsequent data points. The formula for the mean is given below as equation (1). The excel syntax for the mean is AVERAGE(starting cell: ending cell).

\bar{X} = \frac{\sum_{i=1}^{i=n}{X_i}}{n} (1)
However, equation (1) can only be used when the error associated with each measurement is the same or unknown. Otherwise, the weighted average, which incorporates the standard deviation, should be calculated using equation (2) below.

X_{wav} = \frac{\sum{w_i x_i}}{\sum{w_i}} (2)

where w_i = \frac{1}{{\sigma_i}^2} and xi is the data value.

Median

The median is the middle value of a set of data containing an odd number of values, or the average of the two middle values of a set of data with an even number of values. The median is especially helpful when separating data into two equal sized bins. The excel syntax to find the median is MEDIAN(starting cell: ending cell).

Mode

The mode of a set of data is the value which occurs most frequently. The excel syntax for the mode is MODE(starting cell: ending cell).

Considerations

Now that we've discussed some different ways in which you can describe a data set, you might be wondering when to use each way. Well, if all the data points are relatively close together, the average gives you a good idea as to what the points are closest to. If on the other hand, almost all the points fall close to one, or a group of close values, but occassionally a value that differs greatly can be seen, then the mode might be more accurate for describing this system, whereas the mean would incorporate the occassional outlying data. The median is useful if you are interested in the range of values your system could be operating in. Half the values should be above and half the values should be below, so you have an idea of where the middle operating point is.

Standard Deviation and Weighted Standard Deviation

The standard deviation gives an idea of how close the entire set of data is to the average value. Data sets with a small standard deviation have tightly grouped, precise data. Data sets with large standard deviations have data spread out over a wide range of values. The formula for standard deviation is given below as equation (3). The excel syntax for the standard deviation is STDEV(starting cell: ending cell).

\sigma = \sqrt{\frac{1}{n-1}{\sum_{i=1}^{i=n}(X_i-\bar{X})^2}} (3)
Side Note: Bias Estimate of Population Variance
The standard deviation (the square root of variance) of a sample can be used to estimate a population's true variance. Equation (3) above is an unbias estimate of population variance. Equation (3.1) below is another common method for calculating sample standard deviation, although it is an bias estimate. Although the estimate is biased, it is advantageous in certain situations because the estimate has a lower variance. (This relates to the bias-variance trade-off for estimators.)

\sigma_{n} = \sqrt{\frac{1}{n}{\sum_{i=1}^{i=n}(X_i-\bar{X})^2}} (3.1)
When calculated standard deviation values associated with weighted averages, equation (4) below should be used.

\sigma_{wav} = \frac{1}{\sqrt{\sum{w_i}}} (4)

The Sampling Distribution and Standard Deviation of the Mean

Population parameters follow all types of distributions, some are normal, others are skewed like the F-distribution and some don't even have defined moments (mean, variance, etc.) like the Chaucy distribution. However, many statistical methodologies, like a z-test (discussed later in this article), are based off of the normal distribution. How does this work? Most sample data are not normally distributed.
This highlights a common misunderstanding of those new to statistical inference. The distribution of the population parameter of interest and the sampling distribution are not the same. Sampling distribution?!? What is that?
Imagine an engineering is estimating the mean weight of widgets produced in a large batch. The engineer measures the weight of N widgets and calculates the mean. So far, one sample has been taken. The engineer then takes another sample, and another and another continues until a very larger number of samples and thus a larger number of mean sample weights (assume the batch of widgets being sampled from is near infinite for simplicity) have been gathered. The engineer has generated a sample distribution.
As the name suggested, a sample distribution is simply a distribution of a particular statistic (calculated for a sample with a set size) for a particular population. In this example, the statistic is mean widget weight and the sample size is N. If the engineer were to plot a histogram of the mean widget weights, he/she would see a bell-shaped distribution. This is because the Central Limit Theorem guarantees that as the sample size approaches infinity, the sampling distributions of statistics calculated from said samples approach the normal distribution.
Conveniently, there is a relationship between sample standard deviation (σ) and the standard deviation of the sampling distribution (\sigma_{\bar{X}} - also know as the standard deviation of the mean or standard errordeviation). This relationship is shown in equation (5) below:

\sigma_{\bar{X}} = \frac{\sigma_{X}}{\sqrt{N}} (5)

An important feature of the standard deviation of the mean, \sigma_{\bar{X}} is the factor \sqrt{N} in the denominator. As sample size increases, the standard deviation of the mean decrease while the standard deviation, σ does not change appreciably.
Microsoft Excel has built in functions to analyze a set of data for all of these values. Please see the screen shot below of how a set of data could be analyzed using Excel to retrieve these values.

No comments:

Post a Comment