Scott Foresman Reading Recovery Testing Books Level?
Saudi J Anaesth. 2017 May; 11(Suppl 1): S80–S89.
Guidelines for developing, translating, and validating a questionnaire in perioperative and pain medicine
Siny Tsang
Section of Epidemiology, Columbia Academy, New York, NY, USA
Colin F. Royse
oneSection of Surgery, Academy of Melbourne, Melbourne, Victoria, Australia
2Section of Anesthesia and Pain Direction, The Royal Melbourne Hospital, Parkville, Victoria, Australia
Abdullah Sulieman Terkawi
3Department of Anesthesiology, Academy of Virginia, Charlottesville, VA, The states
4Department of Anesthesiology, King Fahad Medical Urban center, Riyadh, Saudi arabia
fiveOutcomes Research Consortium, Cleveland, OH, The states, Us
Abstract
The task of developing a new questionnaire or translating an existing questionnaire into a different language might be overwhelming. The greatest challenge perhaps is to come up with a questionnaire that is psychometrically sound, and is efficient and constructive for use in research and clinical settings. This article provides guidelines for the development and translation of questionnaires for awarding in medical fields, with a special emphasis on perioperative and pain medicine. We provide a framework to guide researchers through the various stages of questionnaire development and translation. To ensure that the questionnaires are psychometrically sound, nosotros present a number of statistical methods to appraise the reliability and validity of the questionnaires.
Keywords: Anesthesia, development, questionnaires, translation, validation
Introduction
Questionnaires or surveys are widely used in perioperative and pain medicine inquiry to collect quantitative information from both patients and health-care professionals. Data of involvement could range from observable information (e.thou., presence of lesion, mobility) to patients' subjective feelings of their current condition (east.g., the amount of pain they feel, psychological status). Although using an existing questionnaire volition save time and resources,[ane] a questionnaire that measures the construct of interest may not be readily bachelor, or the published questionnaire is non available in the linguistic communication required for the targeted respondents. As a outcome, investigators may need to develop a new questionnaire or translate an existing one into the language of the intended respondents. Prior work has highlighted the wealth of literature available on psychometric principles, methodological concepts, and techniques regarding questionnaire development/translation and validation. To that end, this article is not meant to provide an exhaustive review of all the related statistical concepts and methods. Rather, this commodity aims to provide straightforward guidelines for the evolution or translation of questionnaires (or scales) for use in perioperative and pain medicine research for readers who may be unfamiliar with the process of questionnaire development and/or translation. Readers are recommended to consult the cited references to farther examine these techniques for application.
This article is divided into 2 main sections. The first discusses bug that investigators should be enlightened of in developing or translating a questionnaire. The second section of this paper illustrates procedures to validate the questionnaire afterwards the questionnaire is adult or translated. A model for the questionnaire development and translation process is presented in Figure i. In this special result of the Saudi journal of Anesthesia we presented multiple studies of development and validation of questionnaires in perioperative and pain medicine, we encourage readers to refer to them for applied experience.
Questionnaire development and translation processes
Preliminary Considerations
It is crucial to identify the construct that is to be assessed with the questionnaire, equally the domain of involvement volition decide what the questionnaire will measure. The side by side question is: How will the construct be operationalized? In other words, what types of behavior will be indicative of the domain of involvement? Several approaches have been suggested to help with this procedure,[two] such every bit content analysis, review of research, critical incidents, straight observations, expert judgment, and instruction.
Once the construct of interest has been determined, it is important to conduct a literature review to place if a previously validated questionnaire exists. A validated questionnaire refers to a questionnaire/scale that has been developed to be administered amongst the intended respondents. The validation processes should have been completed using a representative sample, demonstrating acceptable reliability and validity. Examples of necessary validation processes tin can be found in the validation section of this paper. If no existing questionnaires are available, or none that are determined to exist advisable, it is appropriate to construct a new questionnaire. If a questionnaire exists, but only in a dissimilar language, the task is to translate and validate the questionnaire in the new language.
Developing a Questionnaire
To construct a new questionnaire, a number of bug should be considered even before writing the questionnaire items.
Identify the dimensionality of the construct
Many constructs are multidimensional, meaning that they are composed of several related components. To fully assess the construct, one may consider developing subscales to appraise the different components of the construct. Next, are all the dimensions every bit important? or are some more important than others? If the dimensions are equally important, one can assign the same weight to the questions (e.g., past summing or taking the average of all the items). If some dimensions are more of import than others, it may not be reasonable to assign the same weight to the questions. Rather, one may consider examining the results from each dimension separately.
Determine the format in which the questionnaire will be administered
Will the questionnaire be self-administered or administered by a research/clinical staff? This decision depends, in part, on what the questionnaire intends to mensurate. If the questionnaire is designed to measure catastrophic thinking related to pain, respondents may be less likely to answer truthfully if a enquiry/clinical staff asked the questions, whereas they may be more than probable to answer truthfully if they are allowed to complete the questionnaire on their own. If the questionnaire is designed to measure patients' mobility after surgery, respondents may be more than likely to overreport the amount of mobility in an attempt to demonstrate recovery. To obtain a more authentic measure of mobility after surgery, it may be preferable to obtain objective ratings past clinical staff.
If respondents are to complete the questionnaire by themselves, the items need to exist written in a way that can be hands understood by the majority of the respondents, generally about Grade half-dozen reading level.[three] If the questionnaire is to be administered to immature respondents or respondents with cognitive damage, the readability level of the items should exist lowered. Questionnaires intended for children should take into consideration the cognitive stages of young people[four] (e.g., pictorial response choices may be more appropriate, such as pain faces to appraise pain[v]).
Decide the detail format
Will the items be open ended or close ended? Questions that are open concluded allow respondents to elaborate upon their responses. Equally more detailed information may be obtained using open-ended questions, these items are all-time suited for situations in which investigators wish to gather more information nigh a specific domain. Still, these responses are oft more difficult to lawmaking and score, which increases the difficulty of summarizing individuals' responses. If multiple coders are included, researchers take to address the additional issue of inter-rater reliability.
Questions that are shut ended provide respondents a limited number of response options. Compared to open up-ended questions, these items are easier to administrate and analyze. On the other hand, respondents may not be able to clarify their responses, and their responses may be influenced by the response options provided.
If close-ended items are to be used, should multiple-choice, Likert-type scales, true/imitation, or other shut-concluded formats be used? How many response options should be bachelor? If a Likert-type scale is to exist adopted, what scale anchors are to exist used to point the degree of agreement (due east.grand., strongly concord, hold, neither, disagree, strongly degree), frequency of an event (e.chiliad., virtually never, in one case in a while, sometimes, often, almost always), or other varying options? To make use of participants' responses for subsequent statistical analyses, researchers should keep in listen that items should exist scaled to generate sufficient variance among the intended respondents.[six,7]
Item development
A number of guidelines have been suggested for writing items.[seven] Items should exist simple, short, and written in language familiar to the target respondents. The perspective should exist consistent beyond items; items that assess affective responses (e.g., feet, depression) should non be mixed with those that appraise behavior (due east.g., mobility, cerebral performance).[viii] Items should appraise only a single outcome. Items that accost more than one upshot, or "double-barreled" items (due east.g., "My daily activities and mood are afflicted by my pain."), should not be used. Avoid leading questions equally they may result in biased responses. Items that all participants would respond similarly (e.one thousand., "I would like to reduce my pain.") should not be used, as the small variance generated will provide limited information about the construct being assessed. Table one summarizes important tips on writing questions.
Table 1
Tips on writing questions[xv,xvi]
The result of whether contrary-scored items should be used remains debatable. Since reverse-scored items are negatively worded, it has been argued that the inclusion of these items may reduce response set bias.[nine] On the other hand, others have found a negative touch on the psychometric properties of scales that included negatively worded items.[x] In contempo years, an increasing amount of literature reports problems with reverse-scored items.[11,12,xiii,fourteen] Researchers who decide to include negatively worded items should take extra steps to ensure that the items are interpreted as intended by the respondents, and that the reverse-coded items have similar psychometric properties as the other regularly coded items.[7]
Determine the intended length of questionnaire
There is no rule of pollex for the number of items that brand up a questionnaire. The questionnaire should contain sufficient items to measure the construct of involvement, but not be and then long that respondents feel fatigue or loss of motivation in completing the questionnaire.[17,xviii] Not only should a questionnaire possess the most parsimonious (i.eastward., simplest) structure,[19] but information technology likewise should consist of items that adequately stand for the construct of interest to minimize measurement error.[20] Although a simple structure of questionnaire is recommended, a large puddle of items is needed in the early stages of the questionnaire's development every bit many of these items might be discarded throughout the development process.[seven]
Review and revise initial puddle of items
After the initial pool of questionnaire items are written, qualified experts should review the items. Specifically, the items should be reviewed to make sure they are accurate, free of item construction problems, and grammatically correct. The reviewers should, to the best of their ability, ensure that the items do not contain content that may be perceived as offensive or biased by a particular subgroup of respondents.
Preliminary pilot testing
Before conducting a pilot test of the questionnaire on the intended respondents, it is advisable to test the questionnaire items on a minor sample (about thirty–50)[21] of respondents.[17] This is an opportunity for the questionnaire programmer to know if in that location is confusion almost any items, and whether respondents accept suggestions for possible improvements of the items. One can too get a crude thought of the response distribution to each item, which can be informative in determining whether there is plenty variation in the response to justify moving forward with a big-scale pilot test. Feasibility and the presence of flooring (nearly all respondents scored near the lesser) or ceiling effects (most all respondents scored about the top) are of import determinants of items that are included or rejected at this stage. Although it is possible that participants' responses to questionnaires may be affected by question order,[22,23,24] this upshot should be addressed only after the initial questionnaire has been validated. The questionnaire items should exist revised upon reviewing the results of the preliminary pilot testing. This process may exist repeated a few times before finalizing the final typhoon of the questionnaire.
Summary
So far, we highlighted the major steps that need to exist undertaken when constructing a new questionnaire. Researchers should be able to conspicuously link the questionnaire items to the theoretical construct they intend to assess. Although such associations may be obvious to researchers who are familiar with the specific topic, they may not be apparent to other readers and reviewers. To develop a questionnaire with good psychometric backdrop that tin can after be applied in research or clinical practice, it is crucial to invest the fourth dimension and effort to ensure that the items adequately appraise the construct of interest.
Translating a Questionnaire
The following department summarizes the guidelines for translating a questionnaire into a different language.
Forward translation
The initial translation from the original linguistic communication to the target language should exist made past at least ii independent translators.[25,26] Preferably, the bilingual translators should exist translating the questionnaire into their mother tongue, to meliorate reverberate the nuances of the target linguistic communication.[27] It is recommended that ane translator be aware of the concepts the questionnaire intend to measure, to provide a translation that more closely resembles the original instrument. It is suggested that a naïve translator, who is unaware of the objective of the questionnaire, produce the 2nd translation and then that subtle differences in the original questionnaire may be detected.[25,26] Discrepancies between the 2 (or more than) translators tin be discussed and resolved between the original translators, or with the addition of an unbiased, bilingual translator who was non involved in the previous translations.
Backward translation
The initial translation should be independently dorsum-translated (i.e., interpret back from the target language into the original language) to ensure the accuracy of the translation. Misunderstandings or unclear wordings in the initial translations may exist revealed in the back-translation.[25] Every bit with the forward translation, the backward translation should be performed by at least 2 independent translators, preferably translating into their mother language (the original language).[26] To avoid bias, back-translators should preferably not be aware of the intended concepts the questionnaire measures.[25]
Expert commission
Constituting an expert committee is suggested to produce the prefinal version of the translation.[25] Members of the committee should include experts who are familiar with the construct of interest, a methodologist, both the forwards and astern translators, and if possible, developers of the original questionnaires. The practiced commission will need to review all versions of the translations and decide whether the translated and original versions achieve semantic, idiomatic, experiential, and conceptual equivalence.[25,28] Any discrepancies will need to be resolved, and members of the practiced commission will demand to achieve a consensus on all items to produce a prefinal version of the translated questionnaire. If necessary, the process of translation and dorsum-translation tin exist repeated.
Preliminary airplane pilot testing
As with developing a new questionnaire, the prefinal version of the translated questionnaire should be pilot tested on a small sample (about xxx–50)[21] of the intended respondents.[25,26] After completing the translated questionnaire, the respondent is asked (verbally past an interviewer or via an open-ended question) to elaborate what they thought each questionnaire item and their corresponding response meant. This approach allows the investigator to make certain that the translated items retained the aforementioned meaning as the original items, and to ensure there is no confusion regarding the translated questionnaire. This procedure may be repeated a few times to finalize the final translated version of the questionnaire.
Summary
In this section, we provided a template for translating an existing questionnaire into a different language. Considering that virtually questionnaires were initially adult in 1 linguistic communication (e.g., English when developed in English-speaking countries[25]), translated versions of the questionnaires are needed for researchers who intend to collect information amid respondents who speak other languages. To compare responses across populations of different language and/or culture, researchers need to make sure that the questionnaires in different languages are assessing the equivalent construct with an equivalent metric. Although the translation process is fourth dimension consuming and costly, information technology is the best method to ensure that a translated measure out is equivalent to the original questionnaire.[28]
Validating a Questionnaire
Initial validation
After the new or translated questionnaire items pass through preliminary pilot testing and subsequent revisions, it is time to deport a pilot test amid the intended respondents for initial validation. In this airplane pilot examination, the final version of the questionnaire is administered to a large representative sample of respondents for whom the questionnaire is intended. If the pilot test is conducted for small samples, the relatively big sampling errors may reduce the statistical power needed to validate the questionnaire.[two]
Reliability
The reliability of a questionnaire can exist considered equally the consistency of the survey results. As measurement fault is present in content sampling, changes in respondents, and differences beyond raters, the consistency of a questionnaire tin be evaluated using its internal consistency, test-retest reliability, and inter-rater reliability, respectively.
Internal consistency
Internal consistency reflects the extent to which the questionnaire items are inter-correlated, or whether they are consistent in measurement of the same construct. Internal consistency is commonly estimated using the coefficient alpha,[29] also known equally Cronbach'south alpha. Given a questionnaire 10, with m number of items, alpha (α) can be computed every bit:
Where,
is the variance of item i, and
is the total variance of the questionnaire.
Cronbach's alpha ranges from 0 to 1 (when some items are negatively correlated with other items in the questionnaire, it is possible to have negative values of Cronbach's alpha). When reverse-scored items are [incorrectly] non reverse scored, information technology can be hands remedied by correctly scoring the items. However, if a negative Cronbach's alpha is withal obtained when all items are correctly scored, there are serious problems in the original design of the questionnaire), with higher values indicating that items are more strongly interrelated with i another. Cronbach'southward α = 0 indicates no internal consistency (i.e., none of the items are correlated with one another), whereas α = one reflects perfect internal consistency (i.east., all the items are perfectly correlated with i another). In do, Cronbach'south alpha of at least 0.seventy has been suggested to betoken adequate internal consistency.[30] A low Cronbach's alpha value may be due to poor inter-relatedness betwixt items; as such, items with depression correlations with the questionnaire total score should be discarded or revised. As alpha is a part of the length of the questionnaire, alpha will increase with the number of items. In addition, blastoff will increase if the variability of each item is increased. It is, therefore, possible to increase blastoff by including more related items, or adding items that accept more variability to the questionnaire. On the other hand, an alpha value that is likewise high (α ≥ 0.90) suggests that some questionnaire items may be redundant;[31] investigators may consider removing items that are essentially request the same thing in multiple means.
It is important to annotation that Cronbach'south blastoff is a property of the responses from a specific sample of respondents.[31] Investigators need to go on in mind that Cronbach's alpha is not "the" guess of reliability for a questionnaire under all circumstances. Rather, the alpha value merely indicates the extent to which the questionnaire is reliable for "a item population of examinees."[32] A questionnaire with fantabulous reliability with one sample may not necessarily have the same reliability in another. Therefore, the reliability of a questionnaire should exist estimated each fourth dimension the questionnaire is administered, including airplane pilot testing and subsequent validation stages.
Test-retest reliability
Test-retest reliability refers to the extent to which individuals' responses to the questionnaire items remain relatively consistent across repeated administration of the aforementioned questionnaire or alternate questionnaire forms.[two] Provided the aforementioned individuals were administered the aforementioned questionnaires twice (or more), test-retest reliability tin be evaluated using Pearson's product moment correlation coefficient (Pearson's r) or the intraclass correlation coefficient.
Pearson's r between the two questionnaires' responses tin be referred to as the coefficient of stability. A larger stability coefficient indicates stronger test-retest reliability, reflecting that measurement error of the questionnaire is less likely to be attributable to changes in the individuals' responses over time.
Test-retest reliability can exist considered the stability of respondents' attributes; it is applicable to questionnaires that are designed to measure personality traits, involvement, or attitudes that are relatively stable across fourth dimension, such as anxiety and pain catastrophizing. If the questionnaires are constructed to measure transitory attributes, such every bit hurting intensity and quality of recovery, test-retest reliability is not applicable as the changes in respondents' responses between assessments are reflected in the instability of their responses. Although exam-retest reliability is sometimes reported for scales that are intended to assess constructs that change between administrations, researchers should be aware that test-retest reliability is not applicative and does non provide useful information about the questionnaires of involvement. Researchers should also be critical when evaluating the reliability estimates reported in such studies.
An important question to consider in estimating test-retest reliability is how much fourth dimension should lapse between questionnaire administrations? If the duration between time 1 and time 2 is also short, individuals may remember their responses in time one, which may overestimate the test-retest reliability. Respondents, especially those recovering from major surgery, may experience fatigue if the retest is administered presently after the beginning administration, which may underestimate the exam-retest reliability. On the other hand, if there is a long menstruum of time between questionnaire administrations, individuals' responses may change due to other factors (e.g., a respondent may be taking pain direction medications to treat chronic pain condition). Unfortunately, at that place is no unmarried answer. The duration should be long plenty to let the effects of memory to fade and to preclude fatigue, just non so long equally to allow changes to take place that may affect the test-retest reliability judge.[17]
Inter-rater reliability
For questionnaires in which multiple raters consummate the same instrument for each examinee (due east.thou., a checklist of beliefs/symptoms), the extent to which raters are consistent in their observations across the aforementioned grouping of examinees can be evaluated. This consistency is referred to as the inter-rater reliability, or inter-rater agreement, and can be estimated using the kappa statistic.[33] Suppose 2 clinicians independently rated the same group of patients on their mobility after surgery (e.g., 0 = needs help of 2+ people; 1 = needs help of 1 person; 2 = contained), kappa (к) can be computed equally follows:
Where, Po is the observed proportion of observations in which the 2 raters agree, and Pe is the expected proportion of observations in which the ii raters agree by chance. Accordingly, к is the proportion of agreement between the ii raters, after factoring out the proportion of agreement by risk. к ranges from 0 to 1, where к = 0 indicates all chance agreements and к =1 represents perfect understanding between the two raters. Others have suggested к = 0 equally no agreement, к = 0.01 − 0.twenty as poor agreement, к = 0.21 − 0.40 every bit slight agreement, к = 0.41 − 0.threescore as off-white agreement, к = 0.61 − 0.lxxx equally good agreement, к = 0.81 − 0.92 as very good agreement, and к = 0.93 − 1 as excellent agreement.[34,35] If more two raters are used, an extension of Cohen'southward к statistic is bachelor to compute the inter-rater reliability beyond multiple raters.[36]
Validity
The validity of a questionnaire is determined by analyzing whether the questionnaire measures what it is intended to measure. In other words, are the inferences and conclusions made based on the results of the questionnaire (i.due east., test scores) valid?[37] Two major types of validity should be considered when validating a questionnaire: content validity and construct validity.
Content validity
Content validity refers to the extent to which the items in a questionnaire are representative of the entire theoretical construct the questionnaire is designed to assess.[17] Although the construct of involvement determines which items are written and/or selected in the questionnaire development/translation phase, content validity of the questionnaire should be evaluated afterwards the initial form of the questionnaire is available.[ii] The process of content validation is specially crucial in the evolution of a new questionnaire.
A console of experts who are familiar with the construct that the questionnaire is designed to measure should be tasked with evaluating the content validity of the questionnaire. The experts judge, as a panel, whether the questionnaire items are adequately measuring the construct intended to appraise, and whether the items are sufficient to measure the domain of involvement. Several approaches to quantify the judgment of content validity across experts are also available, such as the content validity ratio[38] and content validation form.[39,40] Nonetheless, as the process of content validation depends heavily on how well the panel of experts tin can assess the extent to which the construct of interest is operationalized, the option of appropriate experts is crucial to ensure that content validity is evaluated adequately. Example items to assess content validity include:[41]
-
The questions were clear and like shooting fish in a barrel
-
The questions covered all the problem areas with your pain
-
You would like the utilise of this questionnaire for future assessments
-
The questionnaire lacks important questions regarding your pain
-
Some of the questions violate your privacy.
A concept that is related to content validity is confront validity. Face validity refers to the caste to which the respondents or laypersons judge the questionnaire items to exist valid. Such judgment is based less on the technical components of the questionnaire items, but rather on whether the items announced to be measuring a construct that is meaningful to the respondents. Although this is the weakest way to establish the validity of a questionnaire, face validity may motivate respondents to reply more than truthfully. For example, if patients perceive a quality of recovery questionnaire to exist evaluating how well they are recovering from surgery, they may be more likely to respond in means that reflect their recovery status.
Construct validity
Construct validity is the near important concept in evaluating a questionnaire that is designed to measure a construct that is not directly observable (e.g., pain, quality of recovery). If a questionnaire lacks construct validity, information technology volition exist difficult to interpret results from the questionnaire, and inferences cannot be fatigued from questionnaire responses to a behavior domain. The construct validity of a questionnaire can be evaluated by estimating its association with other variables (or measures of a construct) with which it should be correlated positively, negatively, or not at all.[42] In practice, the questionnaire of interest, too as the preexisting instruments that measure similar and different constructs, is administered to the same groups of individuals. Correlation matrices are then used to examine the expected patterns of associations betwixt different measures of the same construct, and those betwixt a questionnaire of a construct and other constructs. It has been suggested that correlation coefficients of 0.1 should be considered as small, 0.iii as moderate, and 0.5 as big.[43]
For example, suppose a new scale is developed to assess pain amidst hospitalized patients. To provide evidence of construct validity for this new pain scale, nosotros can examine how well patients' responses on the new scale correlate with the preexisting instruments that also measure out pain. This is referred to as convergent validity. One would expect strong correlations between the new questionnaire and the existing measures of the same construct, since they are measuring the same theoretical construct.
Alternatively, the extent to which patients' responses on the new pain scale correlate with instruments that measure unrelated constructs, such as mobility or cognitive function, can be assessed. This is referred to equally divergent validity. Every bit hurting is theoretically dissimilar to the constructs of mobility or cognitive role, we would expect nil, or very weak, correlation between the new hurting questionnaire and instruments that assess mobility or cognitive function. Tabular array 2 describes different validation types and important definitions.
Table ii
Questionnaire-related terminology[16,44,45]
Subsequent validation
The process described so far defines the steps for initial validation. However, the usefulness of the scale is the ability to discriminate between different cohorts in the domain of interest. It is brash that several studies investigating different cohorts or interventions should be conducted to identify whether the scale can discriminate betwixt groups. Ideally, these studies should accept clearly defined outcomes where the changes in the domain of interest are well known. For example, in subsequent validation of the Postoperative Quality of Recovery Scale, 4 studies were constructed to show the ability to discriminate recovery and cognition in different cohorts of participants (mixed accomplice, orthopedics, and otolaryngology), besides every bit a homo volunteer study to calibrate the cerebral domain.[46,47,48,49]
Sample size
Guidelines for the respondent-to-item ratio ranged from 5:1[50] (i.east., l respondents for a 10-item questionnaire), 10:i,[xxx] to 15:1 or 30:1.[51] Others suggested that sample sizes of 50 should be considered every bit very poor, 100 as poor, 200 as fair, 300 equally good, 500 as very expert, and 1000 or more equally excellent.[52] Given the variation in the types of questionnaire being used, in that location are no absolute rules for the sample size needed to validate a questionnaire.[53] As larger samples are always better than smaller samples, it is recommended that investigators utilize as large a sample size equally possible. The respondent-to-detail ratios tin exist utilized to further strengthen the rationale for the large sample size when necessary.
Other considerations
Fifty-fifty though data collection using questionnaires is relatively easy, researchers should be cognizant about the necessary approvals that should exist obtained prior to beginning the research project. Considering the differences in regulations and requirements in unlike countries, agencies, and institutions, researchers are advised to consult the inquiry ethics committee at their agencies and/or institutions regarding the necessary blessing needed and additional considerations that should exist addressed.
Decision
In this review, we provided guidelines on how to develop, validate, and translate a questionnaire for utilise in perioperative and pain medicine. The evolution and translation of a questionnaire requires investigators' thorough consideration of issues relating to the format of the questionnaire and the meaning and appropriateness of the items. Once the development or translation phase is completed, it is important to acquit a airplane pilot exam to ensure that the items can be understood and correctly interpreted by the intended respondents. The validation stage is crucial to ensure that the questionnaire is psychometrically sound. Although developing and translating a questionnaire is no like shooting fish in a barrel task, the processes outlined in this article should enable researchers to end upwardly with questionnaires that are efficient and constructive in the target populations.
Financial support and sponsorship
Siny Tsang, PhD, was supported past the inquiry training grant 5-T32-MH 13043 from the National Institute of Mental Health.
Conflicts of interest
In that location are no conflicts of interest.
References
1. Boynton PM, Greenhalgh T. Selecting, designing, and developing your questionnaire. BMJ. 2004;328:1312–5. [PMC gratis article] [PubMed] [Google Scholar]
2. Crocker Fifty, Algina J. Introduction to Classical and Modern Test Theory. Mason, Ohio: Cengage Learning; 2008. [Google Scholar]
three. Davis TC, Mayeaux EJ, Fredrickson D, Bocchini JA, Jr, Jackson RH, Murphy PW. Reading ability of parents compared with reading level of pediatric patient education materials. Pediatrics. 1994;93:460–8. [PubMed] [Google Scholar]
4. Bell A. Designing and testing questionnaires for children. J Res Nurs. 2007;12:461–ix. [Google Scholar]
5. Wong DL, Baker CM. Hurting in children: Comparison of assessment scales. Okla Nurse. 1988;33:eight. [PubMed] [Google Scholar]
6. Rock E. Research Methods in Organizational Beliefs. Glenview, IL: Scott Foresman; 1978. [Google Scholar]
vii. Hinkin TR. A cursory tutorial on the evolution of measures for apply in survey questionnaires. Organ Res Methods. 1998;2:104–21. [Google Scholar]
8. Harrison DA, McLaughlin ME. Cognitive processes in self-report responses: Tests of item context effects in work attitude measures. J Appl Psychol. 1993;78:129–40. [PubMed] [Google Scholar]
ix. Price JL, Mueller CW. Handbook of Organizational Measurement. Marshfield, MA: Pitman; 1986. [Google Scholar]
10. Harrison DA, McLaughlin ME. Exploring the Cognitive Processes Underlying Responses to Cocky-Report Instruments: Furnishings of Item Content on Work Mental attitude Measures. Academy of Management Annual Meetings. 1991:310–4. [Google Scholar]
xi. Embretson SE, Reise SP. Particular Response Theory for Psychologists. Mahwah, N.J: Lawrence Erlbaum Associates, Publishers; 2000. [Google Scholar]
12. Lindwall M, Barkoukis V, Grano C, Lucidi F, Raudsepp 50, Liukkonen J, et al. Method effects: The problem with negatively versus positively keyed items. J Pers Assess. 2012;94:196–204. [PubMed] [Google Scholar]
thirteen. Stansbury JP, Ried LD, Velozo CA. Unidimensionality and bandwidth in the Center for Epidemiologic Studies Depression (CES-D) Scale. J Pers Assess. 2006;86:10–22. [PubMed] [Google Scholar]
14. Tsang S, Salekin RT, Coffey CA, Cox J. A comparison of self-written report measures of psychopathy among non-forensic samples using detail response theory analyses. Psychol Assess. [In press] [PMC costless article] [PubMed] [Google Scholar]
15. Leung WC. How to design a questionnaire. Stud BMJ. 2001;9 [Google Scholar]
16. Artino AR, Jr, La Rochelle JS, Dezee KJ, Gehlbach H. Developing questionnaires for educational enquiry: AMEE Guide No 87. Med Teach. 2014;36:463–74. [PMC complimentary article] [PubMed] [Google Scholar]
17. Schultz KS, Whitney DJ. Measurement Theory in Action: Case Studies and Exercises. M Oaks, CA: Sage; 2005. [Google Scholar]
eighteen. Schmitt NW, Stults DM. Factors defined by negatively keyed items: The results of careless respondents? Appl Psychol Meas. 1985;nine:367–73. [Google Scholar]
xix. Thurstone LL. Multiple-Factor Assay. Chicago, IL: University of Chicago Press; 1947. [Google Scholar]
twenty. Churchill GA. A paradigm for developing amend measures of marketing constructs. J Marker Res. 1979;16:64–73. [Google Scholar]
21. Perneger Tv set, Courvoisier DS, Hudelson PM, Gayet-Ageron A. Sample size for pre-tests of questionnaires. Qual Life Res. 2015;24:147–51. [PubMed] [Google Scholar]
22. Bowling A, Windsor J. The effects of question gild and response-choice on self-rated health status in the English Longitudinal Study of Ageing (ELSA) J Epidemiol Customs Health. 2008;62:81–5. [PubMed] [Google Scholar]
23. Lee Southward, Schwarz Due north. Question context and priming meaning of health: Effect on differences in self-rated wellness betwixt Hispanics and not-Hispanic Whites. Am J Public Wellness. 2014;104:179–85. [PMC free article] [PubMed] [Google Scholar]
24. Schwarz N. Cocky-reports: How the questions shape the answers. Am Psychol. 1999;54:93–105. [Google Scholar]
25. Guillemin F, Bombardier C, Beaton D. Cantankerous-cultural adaptation of health-related quality of life measures: Literature review and proposed guidelines. J Clin Epidemiol. 1993;46:1417–32. [PubMed] [Google Scholar]
26. Beaton D, Bombardier C, Guillemin F, Ferraz One thousand. Recommendations for the Cross-Cultural Adaptation of the Dash and Quick DASH Outcome Measures. Toronto: Found for Work and Health; 2007. [Google Scholar]
27. Hendricson WD, Russell IJ, Prihoda TJ, Jacobson JM, Rogan A, Bishop GD, et al. Development and initial validation of a dual-linguistic communication English-Spanish format for the Arthritis Touch Measurement Scales. Arthritis Rheum. 1989;32:1153–9. [PubMed] [Google Scholar]
28. Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of cocky-report measures. Spine (Phila Pa 1976) 2000;25:3186–91. [PubMed] [Google Scholar]
29. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;xvi:297–334. [Google Scholar]
30. Nunnally J. Psychometric Theory. New York: McGraw-Colina; 1978. [Google Scholar]
31. Streiner DL. Starting at the get-go: An introduction to coefficient alpha and internal consistency. J Pers Assess. 2003;80:99–103. [PubMed] [Google Scholar]
32. Wilkinson L the Task Forcefulness on Statistical Inference. Statistical methods in psychology journals: Guidelines and explanations. Am Psychol. 1999;54:594–604. [Google Scholar]
33. Cohen J. A coefficient of understanding for nominal scales. Educ Psychol Meas. 1960;twenty:37–46. [Google Scholar]
34. Dawson B, Trapp RG. Basic and Clinical Biostatistics. 3rd ed. Norwalk, Conn: Lange Medical Books; 2001. [Google Scholar]
35. Grootscholten C, Bajema IM, Florquin Due south, Steenbergen EJ, Peutz-Kootstra CJ, Goldschmeding R, et al. Inter-observer agreement of scoring of histopathological characteristics and classification of lupus nephritis. Nephrol Dial Transplant. 2008;23:223–30. [PubMed] [Google Scholar]
36. Berry KJ, Mielke PW. A generalization of Cohen'due south kappa agreement measure out to interval measurement and multiple raters. Educ Psychol Meas. 1988;48:921–33. [Google Scholar]
37. Murphy KR, Davidshofer CO. Psychological Testing: Principles and Applications. Upper Saddle River, NJ: Prentice Hall; 2001. [Google Scholar]
38. Lawshe CH. A quantitative approach to content validity. Pers Psychol. 1975;28:563–75. [Google Scholar]
39. Barrett RS. Content validation form. Public Pers Manage. 1992;21:41–52. [Google Scholar]
40. Barrett RS, editor. Off-white Employment Strategies in Human Resource Management. Westport, CT: Quorum Books/Greenwood; 1996. Content validation form; pp. 47–56. [Google Scholar]
41. Alnahhal A, May Southward. Validation of the arabic version of the quebec dorsum pain disability Scale. Spine (Phila Pa 1976) 2012;37:E1645–50. [PubMed] [Google Scholar]
42. Cronbach 50, Meehl P. Construct validity in psychological tests. Psychol Bull. 1955;52:281–302. [PubMed] [Google Scholar]
43. Cohen J. Statistical Power Assay for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Erlbaum; 1988. [Google Scholar]
44. Anthoine E, Moret Fifty, Regnault A, Sébille 5, Hardouin JB. Sample size used to validate a scale: A review of publications on newly-developed patient reported outcomes measures. Health Qual Life Outcomes. 2014;12:176. [PMC gratis article] [PubMed] [Google Scholar]
45. Reeve BB, Wyrwich KW, Wu AW, Velikova Yard, Terwee CB, Snyder CF, et al. ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness enquiry. Qual Life Res. 2013;22:1889–905. [PubMed] [Google Scholar]
46. Newman Due south, Wilkinson DJ, Royse CF. Assessment of early cognitive recovery after surgery using the Mail service-operative Quality of Recovery Calibration. Acta Anaesthesiol Scand. 2014;58:185–91. [PubMed] [Google Scholar]
47. Royse CF, Newman S, Williams Z, Wilkinson DJ. A human volunteer study to place variability in performance in the cerebral domain of the postoperative quality of recovery scale. Anesthesiology. 2013;119:576–81. [PubMed] [Google Scholar]
48. Royse CF, Williams Z, Purser S, Newman S. Recovery after nasal surgery vs. tonsillectomy: Discriminant validation of the Postoperative Quality of Recovery Scale. Acta Anaesthesiol Scand. 2014;58:345–51. [PubMed] [Google Scholar]
49. Royse CF, Williams Z, Ye G, Wilkinson D, De Steiger R, Richardson One thousand, et al. Knee joint surgery recovery: Post-operative Quality of Recovery Scale comparison of age and complexity of surgery. Acta Anaesthesiol Scand. 2014;58:660–7. [PubMed] [Google Scholar]
l. Gorusch RL. Gene Assay. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983. [Google Scholar]
51. Pedhazur RJ. Multiple Regression in Behavioral Research: Caption and Prediction. Fort Worth, TX: Harcourt Brace College Publishers; 1997. [Google Scholar]
52. Comfrey AL, Lee HB. A Offset Course in Factor Assay. Hillsdale, NJ: Lawrence Erlbaum Assembly; 1992. [Google Scholar]
53. Osborne JW, Costello AB. Sample size and subject to particular ratio in master components analysis. Pract Assess Res Eval. 2004;9:viii. [Google Scholar]
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5463570/
0 Response to "Scott Foresman Reading Recovery Testing Books Level?"
Post a Comment