Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Research Principles and Methods: Types of Instruments, Validity and Reliability, Summaries of Performance Evaluation

An overview of the key concepts and methods related to the reliability and validity of research instruments. It covers the factors that contribute to test unreliability, the four basic methods for assessing reliability (test-retest, alternate-form, split-half, and internal consistency), and the four forms of validity (predictive, concurrent, construct, and content). The document also discusses the distinction between random (chance) error and systematic error, and how these impact reliability and validity. Overall, this document offers a comprehensive introduction to the fundamental principles and techniques for ensuring the quality and trustworthiness of research data collection instruments.

Typology: Summaries

2022/2023

Uploaded on 08/12/2024

gualberto-a-lantaya-jr
gualberto-a-lantaya-jr 🇵🇭

1 document

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Research Principles and Methods: Types of Instruments
Development of Instruments, Validity and Reliability
Reliability
reliability refers to a test consistency
reliability concerns the extent to which a measure is accurate and consistent; will it yield
both exacting and the same results on measurements ?
Factors that contribute test unreliability
1. familiarity with a particular test form (multiple choice question)
2. fatigue
3. emotional strain
4. physical conditions during the test
5. health of the examinee
6. fluctuations of human memory
7. amount of practice or experience by the examinee and the skill being measured
8. specific knowledge that has been gained outside of the experience being evaluated
on the test
A test that is overly sensitive to these unpredictable (uncontrollable) sources of error is
not reliable
Test unreliability creates Instrumentation bias a source of internal invalidity in an
experiment
Before drawing conclusions, the reliability of the test instrument used in the experiment
should be assessed
There are four basic methods for assessing the reliability
1. Retest Method
Giving the same people the same test on more than one occasion and then compare each
persons’ performance on the different testing
In this procedure, the scores obtained by each person on the first administration of the test
are related to the second to provide a reliability coefficient
Coefficient vary from 0 (no relationship) to 1.00 (perfect relationship) ; coefficients near
zero are rare
Coefficient is an indication of the extent to which the test is measurably stable
Has the disadvantage of being influenced by practice and memory and whatever events
occur between testing sessions
The simplest method from a conceptual basis to assess reliability is to correlate the measures or
scores obtained from two different tests (i) on the same subject at two different times.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Research Principles and Methods: Types of Instruments, Validity and Reliability and more Summaries Performance Evaluation in PDF only on Docsity!

Research Principles and Methods: Types of Instruments

Development of Instruments, Validity and Reliability

Reliability  reliability refers to a test consistency  reliability concerns the extent to which a measure is accurate and consistent; will it yield both exacting and the same results on measurements?

Factors that contribute test unreliability

    1. familiarity with a particular test form (multiple choice question)fatigue
  1. emotional strain
  2. physical conditions during the test
    1. health of the examineefluctuations of human memory
  3. amount of practice or experience by the examinee and the skill being measured
  4. specific knowledge that has been gained outside of the experience being evaluated on the test  A test that is overly sensitive to these unpredictable (uncontrollable) sources of error is not reliable  Test unreliability creates Instrumentation bias a source of internal invalidity in an  experimentBefore drawing conclusions, the reliability of the test instrument used in the experiment should be assessed There are four basic methods for assessing the reliability 1. Retest Method  Giving the same people the same test on more than one occasion and then compare each  In this procedure, the scores obtained by each person on the first administration of the test^ persons’ performance on the different testing are related to the second to provide a reliability coefficient  Coefficient vary from 0 (no relationship) to 1.00 (perfect relationship) ; coefficients near zero are rare   Coefficient is an indication of the extent to which the test is measurably stableHas the disadvantage of being influenced by practice and memory and whatever events occur between testing sessions

The simplest method from a conceptual basis to assess reliability is to correlate the measures or scores obtained from two different tests (i) on the same subject at two different times.

Example : the researcher desires to use a series of scales to measure physicians' attitudes

toward product selection by the pharmacist. In the retest method, the attitude scales would be administered to a group of physicians, then repeated with the same group of physicians approximately 1-2 weeks later. Intuitively, the retest method is an appealing technique to assess reliability. It is logical, given the basic meaning of reliability. It is not, however, without serious problems and limitations.  First, it is inconvenient to make dual measurements on the same population.  Second, if the time period between tests is short, the scores may be more the result of memory than of test reliability.

2. Alternate-Form Method  Is determined by administering alternate forms of test to the same people and computing the relation between each persons score on the two forms.  This approach requires two forms of a test that parallel one another in the content and mental operations required; two test must have items on one form matched to items of the  other form with corresponding items measuring the same qualityThis approach can be used to assess reliability of either of the two forms by comparison with the other or to determine the extent to which the two forms are parallel; the latter determination is particularly important if one form is to be used as a pretest and the other as a posttest.  The alternative form method assesses reliability by measuring the construct with two different sets of scales which are intended to measure the same things.  This technique is used extensively in education where two forms of an achievement test can be constructed with the same degree of difficulty in content. The reliability is estimated as with the retest method by calculating the correlation coefficient between the same respondents' scores collected at two different times.

Although this technique is superior to retest in some ways, it is critical to ensure that the problem of instrumentation does not creep in to cause further errors.

3. Split-Half Reliability  To determine internal consistency test quickly: split a test into two halves, usually the odd- numbered items and the even–numbered items and then correlate the scores obtained by each person on one half with those obtained by each person on the other.  It enables the researcher to determine whether the halves of a test are measuring the  same quality or characteristic.To obtained correlation coefficient (r calculate the test reliability (r^1 ) is then entered into Spearman-Brown formula to 2 ): nr 1 r 2 = --------------------------- 1 + (n-1) r 1

It is estimated that the scale would need to be 4.0 times the present number of scale items (4.0 * 20 = 80 items) to achieve the desired reliability. The major problem with the split-halves reliability assessment is that the reliability coefficient can differ depending on the grouping-of the scale items into halves. While the odd versus even grouping is quite frequently used, so is the first half of the items versus the last half of the items. Theoretically, any arrangement of items into two groups is possible. With each different grouping, it is quite probable that different reliability coefficients will be obtained.

4. Kuder-Richardson Reliability (Internal Consistency Method)  When test items are scored either a or b (right or wrong) on a untimed test assumed to measure one characteristic or quality , the extent to which the test are all measuring this same characteristics or quality can be determined by examining individual item scores rather than part or total scores ( split-half method)  This formula is known as (K-R formula 21) rK- R21= Kuder-Richardson reliability coefficient N= number of items in the test X= mean score on the test S= standard deviation (measure of variability)  This method offers a fourth alternative to estimating reliability. It requires neither repeated measures nor splitting of scale items. The procedure provides an estimate of reliability that is not altered by arrangement of scale items  This procedure is based on the interpretation of reliability as internal consistency, that is, homogeneity of scale items. As Kerlinger observes, "This interpretation (internal consistency) in effect boils down to the same idea as other interpretations: accuracy.

Any randomly chosen sample of items from the scale when correlated with any other, different, randomly chosen sample of items from the same scale, should if the scale is reliable, produce the same rank ordering of subjects. Repeated indefinitely, the mean correlation coefficient computed in this manner is the estimate of reliability. The most popular internal consistency formula is Cronbach’s alpha: alpha = N/(N – 1)[1 – SumC2(Yi)/Cx alpha is the coefficient; N is the number of items; SumC2(Yi) is the sum of the item variances; and Cx2 is the total composite variance.

 Interpretation of Cronbach's alpha is similar to that of split halves with 2N items.  It can also be considered as the expected correlation between the actual scale and a hypothetical alternative form of the same scale.

 Cronbach's alpha has also been shown to be a conservative estimate of a measure's reliability. Hence, it is safe to assume that a scale with an acceptable coefficient alpha is a reliable measure. On a technical note, a may also be computed with the correlation matrix using the following formula: a = Np/[l + p(N - 1)] where p is the mean inter-item correlation From this equation, it is clear that alpha is a function of both the number of scale items and the inter-item correlation. As the number of items or the inter-item correlation increases, alpha increases. Carmines and Zeller constructed Table 2 to depict this. Note that a scale with six items and inter-item correlation of 0.4 has a higher reliability than a scale with four items in the same inter-item correlation (0.8 versus 0.6). Moreover, a scale with two items and an inter-item correlation of 0.6 has less reliability than a scale of six items and a 0.4 inter-item correlation. As the authors observe, "In sum, the addition of more items to a scale that does not result in a reduction of average inter-item correlation will increase the reliability of one's measuring instrument. 37

Cronbach's alpha is used to estimate internal consistencies, when items are of at least an ordinal level of measurement with more than two intervals. When items are nominal and scored dichotomously, the appropriate estimate of internal consistency is the Kuder-Richardson formula:

KR20 = N /(N - 1)[1 – SumPiQi/Cx2] where KR20 is the Kuder-Richardson coefficient; N is the number of dichotomously scored items; Pi is the proportion responding positively to the ith item; Qi is (1 - Pi); and Cx2 is the total composite variance. KR20 is interpreted the same as the alpha coefficient. What is a Satisfactory Level of Reliability? This question is frequently asked but is difficult to answer with any define degree. As a general rule, most researchers seem to use 0.80 as the minimum cutoff for widely used scales.

Table 2. Values of Cronbach's Alpha for Various Combinations of Different Numbers of Items and Different Inter-Item Correlations.

Number of Items 0.0 Average Inter-Item Correlation0.2 0.4 0.6 0.8 1. ------------------------------------------------- 2 0.00 0.333 0.572 0.750 0.889 1. 46 0.000.00 0.5000.600 0.7270.800 0.8570.900 0.9410.960 1.001. 810 0.000.00 0.6660.714 0.8420.870 0.9240.913 0.9700.976 1.001.

Source: Reference 22.

 Reliability coefficients may be interpreted as estimates of the degree of accuracy in measuring with a particular scale.  Since the coefficient can range from zero to + 1.0, the higher the coefficient the more accurate the measure (less random error).

TEST VALIDITY  The validity of a test is the extent to which a test measures what is purports to measure  validity is concerned with whether it measures the theoretical constructs

Four Forms of Validity

1. Predictive validity  Validity can be established by relating a test to some actual behavior the test is supposed to predict  If a tests can be used to predict an outcome in terms of some performance or behavior criterion; the predictive validity of that test can be obtained by relating test performance to the related behavioral criterion

Example : a test intended to predict student “staying power” in college could be validated by

administering the test to students at the start of their freshman year and then seeing what percentage of the high scores survive four years of college and what percentage of the low scores drop out.

2. Concurrent validity  For some test particularly those that measure characteristics or qualities , it is difficult to establish predictive validity because it is not easy to identify specific performance outcomes related to that characteristics or quality Example: Intelligence tests are often validated concurrently by comparing performance on a newer more experimental one with performance on an older , more established one  Another procedure for establishing the concurrent validity of a test is to compare qualities or performance as assessed by that test to the qualities or performance as assessed by another procedure such as human judges.  Agreement between test and judges would be an indication of the tests concurrent validity. 3. Construct validity  A test builder might reason that a student with high level of self esteem would be more inclined to speak out when unjustly criticized by an authority figure; such behavior can be explained by the construct ( or concept) self esteem.

 Construct validity is established by relating a presumed measure of a construct or hypothetical quality with some behavior to a test of some construct that is an attempt to explain it

4. Content validity  A test is an attempt to determine how an individual will function in a set of actual situations. Rather than placing individuals in each actual situation, a test is used as a short to  determine their behaviors or performances in the situations.Constructing the test requires a selection or sampling of situations from the total set  On the basis of the individuals performance on these sample situations , the researcher should be able to generalize regarding the full set of situation  A test in which the sample of situations or performances measured is representative of the set or performances measured is representative of the set from which the sample was drawn (and about which generalizations are to be made) is considered too have content validity Example: by companies for screening applicants for jobs. The content validity of this test could be suppose a researcher constructed a performance tests of secretarial skills to be used established by comparing: 1. the skill areas covered in the test and the number of test items devoted to each 2. the skill requirements of the job and the relative importance Content Validation Content validity is concerned with the extent to which a given measure is representative of the content. Content validation seeks to answer the question: Is the content of this measure representative of the content of the property being measured? 40 For example, if a test of management skills contains only questions dealing with financial management, neglecting personnel and system management topics, the test would not be valid. Management skills are more extensive than financial management. Thus, by excluding the other content areas, the test would not be truly representative of the subject. The requirement of representativeness places severe limitations on the content validation process. One must be able to (l) define the full domain of relevant contents; (2) randomly select items of content for measurement and (3) design measurement processes, to score the content. 41 While this may be accomplished easily in developing achievement tests, it becomes increasingly difficult as the subject becomes more complex. Essentially, content validation lies basically in the judgment of the researcher.

Criterion-related Validation. As the name implies, this validation process consists of relating the test scores obtained with some eternal variable(s) or criteria believed to measure the phenomenon or event under study.

Example : one way to validate the Pharmacy College Admission Test (PCAT) would be to show

a high degree of correlation between the test scores and student performance during the first year

Tests of Validity

Unlike the empirical tests cited for estimates of reliability, validity estimates are less exacting and often more subjective in their interpretation. The important distinction that must be remembered is that validity is determined not for the measuring instrument itself, but rather “the measuring instrument in relation to the purpose in which it’s being used”. 39 Thus, a measure may be valid for assessing one construct but invalid for assessing a second construct, even though the second may be closely related to the first. The concept of validity as "measuring what it purports to measure" must be remembered constantly. It is typically assessed in terms of content, criterion- related, or construct validity.

Once a research question has been determined the next step is to identify which method will be appropriate and effective.

Survey Questionnaire Interview Standardized Scales/Instruments

To learn what people think about leisure motivation. To identify relationships between motivation and satisfaction. Use interviews, surveys and standardized scales.

Experimental True designs Quasi designs

Obtain information under controlled conditions about leisure attitudes and experience with virtual reality. Subjects may be randomly assigned to various tests and experiences then assessed via observation or standardized scales.

Other Field Methods Nominal Group Technique Delphi

To identify trends and issues about leisure services, management and delivery systems. Focus Group systems. Various group, question and pencil paper exercises are used by facilitators.

Multimethods Approach Combination of methods shown

Interviews, journals and quantitative measures are combined to provide a more accurate definition and operationalization of the concept. Source: Issac & Michael, 1985; Leedy, 1985; Dandekar, 1988; Thomas & Nelson, 1990.

Each research method has it's strengths and weaknesses. When designing a research study it is important to decide what the outcome (data) the study will produce then select the best methodology to produce that desired information. Experimental Treatments

Experimental designs are the basis of statistical significance. An example of the fundamentals of an experimental design is shown below.

A researcher is interested in the effect of an outdoor recreation program (the independent variable, experimental treatment, or intervention variable) on behaviors (dependent or outcome variables) of youth- at-risk. In this example, the independent variable (outdoor recreation program) is expected to effect a change in the dependent variable. Even with a well designed study, an question remains, how can the researcher be confident that the changes in behavior, if any, were caused by the outdoor recreation program, and not some other, intervening or extraneous variable? An experimental design does not eliminate intervening or extraneous variables; but, it attempts to account for their effects. Experimental Control Experimental control is associated with four primary factors (Huck, Cormier, & Bounds, 1974).

  1. The random assignment of individual subjects to comparison groups;
    1. The extent to which the independent variable can be manipulated by the researcher;The time when the observations or measurements of the dependent variable occur; and
  2. Which groups are measured and how. Treatment Group : The portion of a sample or population that is exposed to a manipulation of the independent variable is known as the treatment group. For example, youth who enroll and participate in recreation programs are the treatment group, and the group to which no recreation services are provided constitutes the control group. Validity Issues: There are two primary criteria for evaluating the validity of an experimental design. 1.Internal validity Determines whether the independent variable made a difference in the study? Can a cause-and- effect relationship be observed? To achieve internal validity, the researcher must design and conduct the study so that only the independent variable can be the cause of the results (Cozby, 1993). 2. External validity refers to the extent to which findings can be generalized or be considered representative of the population. Confounding Errors : are conditions that may confuse the effect of the independent variable with that of some other variable(s).
  3. Premeasurement and interaction errors
  4. Maturation errors
  1. Formulate the research problem including identification of factors that may influence dependent variable(s).
    1. Identify alternate hypotheses that may explain the relationships.Identify and select subject groups.
  2. Collect and analyze data Ex post facto studies cannot prove causation, but may provide insight into understanding of phenomenon. References
  3. Stevens S. Handbook of experimental psychology. New York: John Wiley & Sons; 1951: 22.
  4. Kerlinger FN. Foundations of behavioral research, 2nd ed. New York: Holt, Rinehart and Winston, Inc; 1973: 426-41.
  5. Hepler CD. Problems and hypotheses. Am J Hosp Pharm. 1980; 37: 257-63.
  6. Carmines EG, Zeller RA. Reliability and validity assessment. Sage University series on quantitative applications in the social sciences, series no. 07-001. Beverly Hills: Sage Publications; 1979: 10-1.
  7. Phillips BS. Social research strategy and tactics, 2nd ed. New York: MacMillan; 1971: 205.
  8. Smith PC, Kendall LM, Hulin, CL. The Measurement of satisfaction in work and retirement. Chicago: Rand McNally; 1969. 7. Siegal S. Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill; 1956: 21-7.
  9. Torgenson WS: Theory and methods of scaling. New York: John Wiley & Sons; 1958: Chapter 3.
  10. Goodenough WH. A technique for scale analysis. Educ Psychol Measmt. 1944; 4:179-90. 10. Emory CM. Business research methods. Homewood, IL: Irwin; 1981: 257-92.
  11. Schooler C. A note of extreme caution on the use of the Guttman scales. Am J Soc. 1966; 29:
  12. Robinson JP. Toward a more appropriate use of Guttman scaling: Pab. Opin Q. 1973; 37:260- 7.
  13. Johnston WP. Institutional pharmacy administrative skills and undergraduate administrative content. Unpublished M.S. thesis, University of South Carolina, 1979.
  14. Schwab DP, Heneman HG, DeCotiis TA. Behaviorally anchored reading scales: A review of the literature. Personal Psychology. 1975; 29: 549-62.
  15. Snider JG, Osgood CE. Semantic differential technique: A source book. Chicago: Aldine; 1972.
  16. Knapp DE, Knapp DA, Edwards, JD. The pharmacist as perceived by physicians, patrons, and other pharmacists. J Am Pharm Assoc. 1969; NS9:80-2, 84. 17. Kerlinger, op. cit. 439-41.
  17. Acock AC, Marin JD. The undermeasurement controversy: Should ordinal data be treated as interval? Social Soc Res. 1974; 58:427-33.
  18. Gufford J. Psychometric methods, 2d ed. New York: McGraw Hill; 1954. 20. Gufford. op. cit. Chapters 8 and 9.
  19. Kerlinger. op. cit. 442-73.
  20. Carmines and Zeller. op. cit. 11-6.
  21. Stanley JC. Reliability. In: Thorndike RL, ed. Educational measurement. Washington: American Council on Education; 1971: 365-442.
  22. Kerlinger. op. cit. 437.
  1. Schack DW, Hepler CD. Modification of Hall's professionalism scale for use with pharmacists. Am J Pharm Educ. 1979; 43: 98-104.
  2. Carmines and Zeller. Opt cit. 13. 27. Ibid. 15.
  3. Ibid. 33.
  4. Bohrnstedt BW. Measurement. In: Wright J, Rossi P, eds. Handbook of survey research. New York: Academic Press; 1979. 30. Carmines and Zeller. Opt cit. 41.
  5. Spearman C. Correlation calculated from faulty data. Br J Psychiatry. 1910; 3:271-95.
  6. Brown W. Some experimental results in the correlation of mental ability. Br J Psychiatry. 1910; 3:296-322. 33. Kerlinger. op. cit. 451.. 34. Cronbach LJ. Coefficient alpha and the internal consistency of tests. Psychometrika. 1951; 16:297-334.
  7. Novick M, Lewis G. Coefficient alpha and the reliability of composite measurements. Psychometrika. 1967; 32:1-13. 36. Nunnally JC. Psychometric theory. New York: McGraw-Hill; 1978.
  8. Carmines and Zeller. Opt cit. 46. 38. Kuder GF, Richardson MW. The theory of the estimation of test reliability. Psychometrika. 1937; 2:151-60.
  9. Carmines and Zeller. op cit. 17. 40. Kerlinger. op. cit. 457-62.
  10. Carmines and Zeller. op cit. 20.
  11. Ibid. 23.