








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of the key concepts and methods related to the reliability and validity of research instruments. It covers the factors that contribute to test unreliability, the four basic methods for assessing reliability (test-retest, alternate-form, split-half, and internal consistency), and the four forms of validity (predictive, concurrent, construct, and content). The document also discusses the distinction between random (chance) error and systematic error, and how these impact reliability and validity. Overall, this document offers a comprehensive introduction to the fundamental principles and techniques for ensuring the quality and trustworthiness of research data collection instruments.
Typology: Summaries
1 / 14
This page cannot be seen from the preview
Don't miss anything!
Reliability reliability refers to a test consistency reliability concerns the extent to which a measure is accurate and consistent; will it yield both exacting and the same results on measurements?
Factors that contribute test unreliability
The simplest method from a conceptual basis to assess reliability is to correlate the measures or scores obtained from two different tests (i) on the same subject at two different times.
toward product selection by the pharmacist. In the retest method, the attitude scales would be administered to a group of physicians, then repeated with the same group of physicians approximately 1-2 weeks later. Intuitively, the retest method is an appealing technique to assess reliability. It is logical, given the basic meaning of reliability. It is not, however, without serious problems and limitations. First, it is inconvenient to make dual measurements on the same population. Second, if the time period between tests is short, the scores may be more the result of memory than of test reliability.
2. Alternate-Form Method Is determined by administering alternate forms of test to the same people and computing the relation between each persons score on the two forms. This approach requires two forms of a test that parallel one another in the content and mental operations required; two test must have items on one form matched to items of the other form with corresponding items measuring the same qualityThis approach can be used to assess reliability of either of the two forms by comparison with the other or to determine the extent to which the two forms are parallel; the latter determination is particularly important if one form is to be used as a pretest and the other as a posttest. The alternative form method assesses reliability by measuring the construct with two different sets of scales which are intended to measure the same things. This technique is used extensively in education where two forms of an achievement test can be constructed with the same degree of difficulty in content. The reliability is estimated as with the retest method by calculating the correlation coefficient between the same respondents' scores collected at two different times.
Although this technique is superior to retest in some ways, it is critical to ensure that the problem of instrumentation does not creep in to cause further errors.
3. Split-Half Reliability To determine internal consistency test quickly: split a test into two halves, usually the odd- numbered items and the even–numbered items and then correlate the scores obtained by each person on one half with those obtained by each person on the other. It enables the researcher to determine whether the halves of a test are measuring the same quality or characteristic.To obtained correlation coefficient (r calculate the test reliability (r^1 ) is then entered into Spearman-Brown formula to 2 ): nr 1 r 2 = --------------------------- 1 + (n-1) r 1
It is estimated that the scale would need to be 4.0 times the present number of scale items (4.0 * 20 = 80 items) to achieve the desired reliability. The major problem with the split-halves reliability assessment is that the reliability coefficient can differ depending on the grouping-of the scale items into halves. While the odd versus even grouping is quite frequently used, so is the first half of the items versus the last half of the items. Theoretically, any arrangement of items into two groups is possible. With each different grouping, it is quite probable that different reliability coefficients will be obtained.
4. Kuder-Richardson Reliability (Internal Consistency Method) When test items are scored either a or b (right or wrong) on a untimed test assumed to measure one characteristic or quality , the extent to which the test are all measuring this same characteristics or quality can be determined by examining individual item scores rather than part or total scores ( split-half method) This formula is known as (K-R formula 21) rK- R21= Kuder-Richardson reliability coefficient N= number of items in the test X= mean score on the test S= standard deviation (measure of variability) This method offers a fourth alternative to estimating reliability. It requires neither repeated measures nor splitting of scale items. The procedure provides an estimate of reliability that is not altered by arrangement of scale items This procedure is based on the interpretation of reliability as internal consistency, that is, homogeneity of scale items. As Kerlinger observes, "This interpretation (internal consistency) in effect boils down to the same idea as other interpretations: accuracy.
Any randomly chosen sample of items from the scale when correlated with any other, different, randomly chosen sample of items from the same scale, should if the scale is reliable, produce the same rank ordering of subjects. Repeated indefinitely, the mean correlation coefficient computed in this manner is the estimate of reliability. The most popular internal consistency formula is Cronbach’s alpha: alpha = N/(N – 1)[1 – SumC2(Yi)/Cx alpha is the coefficient; N is the number of items; SumC2(Yi) is the sum of the item variances; and Cx2 is the total composite variance.
Interpretation of Cronbach's alpha is similar to that of split halves with 2N items. It can also be considered as the expected correlation between the actual scale and a hypothetical alternative form of the same scale.
Cronbach's alpha has also been shown to be a conservative estimate of a measure's reliability. Hence, it is safe to assume that a scale with an acceptable coefficient alpha is a reliable measure. On a technical note, a may also be computed with the correlation matrix using the following formula: a = Np/[l + p(N - 1)] where p is the mean inter-item correlation From this equation, it is clear that alpha is a function of both the number of scale items and the inter-item correlation. As the number of items or the inter-item correlation increases, alpha increases. Carmines and Zeller constructed Table 2 to depict this. Note that a scale with six items and inter-item correlation of 0.4 has a higher reliability than a scale with four items in the same inter-item correlation (0.8 versus 0.6). Moreover, a scale with two items and an inter-item correlation of 0.6 has less reliability than a scale of six items and a 0.4 inter-item correlation. As the authors observe, "In sum, the addition of more items to a scale that does not result in a reduction of average inter-item correlation will increase the reliability of one's measuring instrument. 37
Cronbach's alpha is used to estimate internal consistencies, when items are of at least an ordinal level of measurement with more than two intervals. When items are nominal and scored dichotomously, the appropriate estimate of internal consistency is the Kuder-Richardson formula:
KR20 = N /(N - 1)[1 – SumPiQi/Cx2] where KR20 is the Kuder-Richardson coefficient; N is the number of dichotomously scored items; Pi is the proportion responding positively to the ith item; Qi is (1 - Pi); and Cx2 is the total composite variance. KR20 is interpreted the same as the alpha coefficient. What is a Satisfactory Level of Reliability? This question is frequently asked but is difficult to answer with any define degree. As a general rule, most researchers seem to use 0.80 as the minimum cutoff for widely used scales.
Table 2. Values of Cronbach's Alpha for Various Combinations of Different Numbers of Items and Different Inter-Item Correlations.
Number of Items 0.0 Average Inter-Item Correlation0.2 0.4 0.6 0.8 1. ------------------------------------------------- 2 0.00 0.333 0.572 0.750 0.889 1. 46 0.000.00 0.5000.600 0.7270.800 0.8570.900 0.9410.960 1.001. 810 0.000.00 0.6660.714 0.8420.870 0.9240.913 0.9700.976 1.001.
Source: Reference 22.
Reliability coefficients may be interpreted as estimates of the degree of accuracy in measuring with a particular scale. Since the coefficient can range from zero to + 1.0, the higher the coefficient the more accurate the measure (less random error).
TEST VALIDITY The validity of a test is the extent to which a test measures what is purports to measure validity is concerned with whether it measures the theoretical constructs
1. Predictive validity Validity can be established by relating a test to some actual behavior the test is supposed to predict If a tests can be used to predict an outcome in terms of some performance or behavior criterion; the predictive validity of that test can be obtained by relating test performance to the related behavioral criterion
administering the test to students at the start of their freshman year and then seeing what percentage of the high scores survive four years of college and what percentage of the low scores drop out.
2. Concurrent validity For some test particularly those that measure characteristics or qualities , it is difficult to establish predictive validity because it is not easy to identify specific performance outcomes related to that characteristics or quality Example: Intelligence tests are often validated concurrently by comparing performance on a newer more experimental one with performance on an older , more established one Another procedure for establishing the concurrent validity of a test is to compare qualities or performance as assessed by that test to the qualities or performance as assessed by another procedure such as human judges. Agreement between test and judges would be an indication of the tests concurrent validity. 3. Construct validity A test builder might reason that a student with high level of self esteem would be more inclined to speak out when unjustly criticized by an authority figure; such behavior can be explained by the construct ( or concept) self esteem.
Construct validity is established by relating a presumed measure of a construct or hypothetical quality with some behavior to a test of some construct that is an attempt to explain it
4. Content validity A test is an attempt to determine how an individual will function in a set of actual situations. Rather than placing individuals in each actual situation, a test is used as a short to determine their behaviors or performances in the situations.Constructing the test requires a selection or sampling of situations from the total set On the basis of the individuals performance on these sample situations , the researcher should be able to generalize regarding the full set of situation A test in which the sample of situations or performances measured is representative of the set or performances measured is representative of the set from which the sample was drawn (and about which generalizations are to be made) is considered too have content validity Example: by companies for screening applicants for jobs. The content validity of this test could be suppose a researcher constructed a performance tests of secretarial skills to be used established by comparing: 1. the skill areas covered in the test and the number of test items devoted to each 2. the skill requirements of the job and the relative importance Content Validation Content validity is concerned with the extent to which a given measure is representative of the content. Content validation seeks to answer the question: Is the content of this measure representative of the content of the property being measured? 40 For example, if a test of management skills contains only questions dealing with financial management, neglecting personnel and system management topics, the test would not be valid. Management skills are more extensive than financial management. Thus, by excluding the other content areas, the test would not be truly representative of the subject. The requirement of representativeness places severe limitations on the content validation process. One must be able to (l) define the full domain of relevant contents; (2) randomly select items of content for measurement and (3) design measurement processes, to score the content. 41 While this may be accomplished easily in developing achievement tests, it becomes increasingly difficult as the subject becomes more complex. Essentially, content validation lies basically in the judgment of the researcher.
Criterion-related Validation. As the name implies, this validation process consists of relating the test scores obtained with some eternal variable(s) or criteria believed to measure the phenomenon or event under study.
a high degree of correlation between the test scores and student performance during the first year
Unlike the empirical tests cited for estimates of reliability, validity estimates are less exacting and often more subjective in their interpretation. The important distinction that must be remembered is that validity is determined not for the measuring instrument itself, but rather “the measuring instrument in relation to the purpose in which it’s being used”. 39 Thus, a measure may be valid for assessing one construct but invalid for assessing a second construct, even though the second may be closely related to the first. The concept of validity as "measuring what it purports to measure" must be remembered constantly. It is typically assessed in terms of content, criterion- related, or construct validity.
Once a research question has been determined the next step is to identify which method will be appropriate and effective.
Survey Questionnaire Interview Standardized Scales/Instruments
To learn what people think about leisure motivation. To identify relationships between motivation and satisfaction. Use interviews, surveys and standardized scales.
Experimental True designs Quasi designs
Obtain information under controlled conditions about leisure attitudes and experience with virtual reality. Subjects may be randomly assigned to various tests and experiences then assessed via observation or standardized scales.
Other Field Methods Nominal Group Technique Delphi
To identify trends and issues about leisure services, management and delivery systems. Focus Group systems. Various group, question and pencil paper exercises are used by facilitators.
Multimethods Approach Combination of methods shown
Interviews, journals and quantitative measures are combined to provide a more accurate definition and operationalization of the concept. Source: Issac & Michael, 1985; Leedy, 1985; Dandekar, 1988; Thomas & Nelson, 1990.
Each research method has it's strengths and weaknesses. When designing a research study it is important to decide what the outcome (data) the study will produce then select the best methodology to produce that desired information. Experimental Treatments
Experimental designs are the basis of statistical significance. An example of the fundamentals of an experimental design is shown below.
A researcher is interested in the effect of an outdoor recreation program (the independent variable, experimental treatment, or intervention variable) on behaviors (dependent or outcome variables) of youth- at-risk. In this example, the independent variable (outdoor recreation program) is expected to effect a change in the dependent variable. Even with a well designed study, an question remains, how can the researcher be confident that the changes in behavior, if any, were caused by the outdoor recreation program, and not some other, intervening or extraneous variable? An experimental design does not eliminate intervening or extraneous variables; but, it attempts to account for their effects. Experimental Control Experimental control is associated with four primary factors (Huck, Cormier, & Bounds, 1974).