### A

*absolute frequency*

The number of cases or respondents appearing in each category of a frequency distribution or an each cell of a cross-tabulation table.

*absolute value*

The value of a number, ignoring the plus or minus sign.

*accessibility bias*

One type of selection bias during sampling, where some respondents in the population are over- or underrepresented because they are more accessible or are less accessible than others.

*acquiescence bias*

Agreement or consent by respondents to what they believe an interviewer or sponsor would like them to think or say rather than giving their real opinion or reaction.

*action component*

One of the three basic components of an attitude, indicating the person’s consistent tendency to take action regarding the topic or to remain passive about it.

*adjective checklist*

A scaling device that lists a series of adjectives that might be used to describe some person, place, or thing and asks the respondent to check any adjective that applies.

*affiliations*

The network of durable, formal, and informal associations an individual has with family, relatives, friends, and acquaintances.

*affinity bias*

A form of interviewer bias resulting from interviewers showing preference for certain types of people for whom they have an affinity, such as respondents who are similar to them or that they find attractive, and including them in the sample at higher rates than others.

*aided recall*

A form of questioning respondents about what they remember, where their memory is aided by presenting or describing the things they might recall.

*all-inclusive categories*

Response categories defined to ensure that every feasible answer will fit into a category; that there will be a category for every possible answer.

*alpha level*

The critical value, or probability level above which a relationship between variables will not be regarded as statistically significant because it is too likely that it could result only by chance from sampling error if the variables were actually not related in the population as a whole; also refers to the probability of a Type I error in academic research.

*alphanumeric variable *

A variable whose “values” are characters (words or letters) rather than numeric values.

*alternative hypothesis*

The proposition that some condition or relationship exists, accepted in scientific or academic research, if the results fail to support the “null hypothesis” that it does not exist.

*analysis of variance (ANOVA)*

A statistical measure of the association between a categorical independent variable and a continuous, numerical, dependent variable from an interval or ratio scale, used to assess the significance of differences among means for different groups.

*area sampling*

A form of cluster sampling where the region of the population is first divided into areas, some are randomly selected, and then respondents within those areas are randomly selected.

*attitude*

Relatively durable, psychological predispositions of people to respond toward or against an object, person, place, idea, or symbol, consisting of three components: their knowledge or beliefs, their feelings or evaluations, and their tendency toward action or passivity.

*attitude scale *

A scale used to measure attitudes, usually focusing on the respondents’ feelings or evaluations toward one or more objects or copies.

*auspices bias*

The tendency for respondents to react toward the survey sponsor, if they know who the sponsor is, rather than providing their own, honest reactions to the survey questions themselves.

*average *

A measure of central tendency that represents the most typical case, usually referring to the arithmetic mean, but also applying to the median and the mode.

### B

*banner*

A method for showing several cross-tabulations in one, condensed table in order to save space or facilitate comparison, ordinarily used only when one variable is cross-tabulated against several others.*bar chart *

A graphic portrayal of several quantities, such as frequencies or percentages, where the length of the horizontal or vertical bars represents the relative magnitudes of the values.

*behavior*

The actions of people or objects in the past or at the present.*bias *

The tendency for some extraneous factor to affect the answers to survey questions or the survey results in general, in a systematic way, so that results are “pushed” or “pulled” in some specific direction.

*bimodality *

The existence of two modes or peaks in a distribution of response, rather than a single modal value, often caused when the sample contains two, distinct populations or groups with differing reactions.

*bipolar adjectives *

A pair of adjectives, such as those used in a semantic differential scale, that represent the polar extremes on one dimension or continuum.

*bivariate relationship *

A relationship between only two variables.

*bivariate statistics *

The statistics used to measure the relationship between only two variables and assess its statistical significance.

*bottle scale *

A pictorial scale showing a series of bottles filled to varying levels, sometimes used when surveying those who may not always understand verbal or numeric scales, such as young children or those with reading impairments.

*breakdowns *

A data analysis procedure that computes and reports the mean, standard deviation, and number of cases for a continuous numeric variable for each level of a categorical variable, ordinarily using analysis of variants to measure the statistical significance of differences in mean values.

### C

*callbacks*

The second and subsequent attempts to contact respondents by telephone or in person when they were not present to respond to the first attempt to contact them.

*case*

A set of data obtained from one completed questionnaire or one respondent that serves as a single unit for analysis.

*categorical data *

Nominal data where the values of the variables are merely the names of discrete, independent categories and the numeric magnitudes have no meaning or stand in no fixed relationship, as opposed to continuous data.

*categorical item *

A survey question coded with values that are merely the names of categories, so that the values do not represent magnitudes or stand in any ordered relationship with one another.

*causality*

The potential influence or effect that one item or variable has on another.*ceiling effects *

The truncation or “chopping off” of the high side of a distribution because respondents’ answers could go no further up the scale.

*census*

Counting or taking measurements from all members of a given population, rather than sampling only a portion to represent the whole.

*central systems *

Computer systems that do the actual processing at some central location accessible by communication to several operators or users.

*central tendency measures *

Statistical averages that describe the most typical value or case, such as the mean, median, and mode.

*chi-square *

A value, usually obtained from crosstabulation of two items in survey research, that can be compared with the values of the chi-square distribution to obtain a probability for assessing statistical significance.

*chi-square distribution *

The particular form of a distribution derived from a set of computations and defined by the number of “degrees of freedom,” often listed in statistical reference tables.

*classification variables *

Survey items, such as demographic variables, that are used to classify respondents into groups or categories for comparison.

*closed-ended questions *

A structured survey question where the alternative answers are listed so that respondents must ordinarily pick only from among them.

*cluster *

One group of individuals or sampling units that have proximity with one another within the sample frame in some respect, such as those within a given area.

*cluster bias *

A form of selection bias resulting when a cluster sampling design selects respondents who are too closely related to one another within a cluster, so that they tend to give similar responses.

*cluster sampling *

A technique often used in surveys to save travel or long-distance toll charges where the population is divided into clusters and a few clusters, each containing many respondents, are randomly selected.

*codebook *

The entire set of codelists for several variables from one survey.

*codelist *

The list of code values and category labels for a single survey variable that is generated by postcoding and used during analysis and reporting.

*coding *

The process of assigning code values to the various alternative answers to survey questions, either when constructing the questionnaire (precoding) or after the data collection (postcoding).

*coefficient of determination *

The square of the correlation coefficient, indicating the proportion of “shared” variance for correlation or the proportion of variance in the dependent variable that was “explained” by the independent variable in regression.

*comparative scale *

A scale using one entity as the standard by which one or more others are judged or evaluated.

*computer hardware *

The physical devices and components of computers, including both the central processing unit and the peripheral devices for inputting, storing, and outputting data.

*computer software*

The programs and coded instructions to the computer, including both the operating system that provides general control and the applications programs that perform specific computations.

*conditional branching *

Instructions or “go-to” statements in a questionnaire indicating the interviewer or respondent should skip items that don’t apply, based on answers to previous questions.

*confidence interval *

The range around a numeric statistical value obtained from a sample, within which the actual, corresponding value for the population is likely to fall, at a given level of probability.

*confidence level *

The specific probability of obtaining some result from a sample if it did not exist in the population as a whole, at or below which the relationship will be regarded as statistically significant.

*continuous variable *

A variable that represents a continuum without any breaks or interruptions, so the numeric values could potentially take on an infinite number of values expressed in whole numbers and fractions.

*convenience sample*

A sample selected more on the bias of the researcher or data collection team’s convenience than on the requirements for random selection with a known probability of inclusion and representation.

*correlation analysis *

A measure of the relationship or association between two continuous numeric variables that indicates both the direction and degree to which they co-vary with one another from case to case, without implying that one is causing the other.

*correlation coefficient *

The value computed with correlation analysis, ranging from zero to indicate no systematic relationship to plus or minus one, indicating a perfect linear relationship, where the positive or negative value shows if the relationship is direct or inverse, respectively.

*correlation matrix *

The correlation coefficients between each pair, for several variables, arranged so that each variable is identified on each row and on each column, with the coefficient listed in the cells defined by the rows and columns.

*critical value *

The probability level above which a relationship between variables will not be regarded as statistically significant because it is too likely that it could result only by chance from sampling error if the variables were actually not related in the population as a whole.

*cross-tabulation *

Plotting two categorical variables in the form of a matrix so that the values of one variable define the rows and the values of the other define the columns, with the cells containing the frequency of cases with a given value for each of the two items and from which a chi-square value can be computed to assess the statistical significance of the relationship.

*curvilinear *

A line or distribution of values that is continuous, but forms an arch, rather than a straight line.

### D

*data*

Most often numbers, but also letters or words that symbolize or represent quantities, entities, or categories of things.

*data analysis *

The manipulation of numbers, letters, or symbols in order to suppress the detail and reveal the relevant facts or relationships.

*data collection *

The process of communicating questions and obtaining a record of responses from a sample, either by mail, telephone, or personal interviewing.

*data field *

The location and number of columns in a data file record required to contain the largest number of digits for any code value for a particular variable.*data point *

A datum, or one single entry of a number, letter, or symbol, usually for one variable and one case or respondent.

*data processing *

Submitting the survey data to computer programs and routines in order to perform the statistical analysis and to generate reports, as opposed to hand tabulation of the data.

*decisions *

The process and/or results of individual evaluations, judgments, and choices among alternatives.

*degrees of freedom (d.f.) *

A parameter most often based on the number of cases or respondents, but slightly reduced to adjust for some earlier computations and used when checking reference tables or computing probability to assess statistical significance.

*demographics *

A set of conditions or attributes of people, often including age, sex, marital status, education, employment, occupation, and income, among others, usually measured in surveys to determine the types of people represented by the sample and to make comparisons of other results among demographic groups.

*dependent variable*

The variable that is viewed as being potentially influenced, affected, or determined by some other variable in a cause-and-effect relationship, based on the logic and meaning of the things represented by the variables.

*depicted scale *

Any scale that cannot be included within the survey question, so that it must be shown in the form of numbers, words, or pictures representing response alternatives.

*descriptive research *

Research that is designed primarily to describe rather than to explain a set of conditions, characteristics, or attributes of people in a population, based on measurement of a sample.

*descriptive statistics *

Statistics such as averages and measures of spread, used to suppress the detail in data files and to condense and summarize the data to make facts more visible, as well as to indicate the degree to which the sample data are likely to represent the entire population.

*desires *

Express wishes or conscious urges of respondents that may be a topic for survey research measurement within the broader topic category of “needs.”

*desktop systems *

Computer systems that do some or all of the processing right at the operator’s desk or work station.*diagram scale *

Any form of scale that uses a diagram to depict the response options or to obtain or collect answers to survey questions.

*dichotomous question *

A question with only two response alternatives, such as a yes/no question or an item that can either be checked or ignored.

*discrete variable *

A categorical variable yielding nominal data, where all of the answers must fall within a category and the code values stand in no ordered relationship to one another.

*discriminant analysis *

A statistical measure of the relationship between a continuous, numeric independent variable from an interval or ratio scale and a categorical dependent variable defining two or more groups, used both to assess statistical significance and also to compute the discriminant function, used to predict or classify new cases into groups.

*discriminant function *

The prediction or classification equation obtained from discriminant analysis, used to predict or classify new cases into groups when only the value of the independent variable is known.

*dispersion *

The range and degree of spread or variance in the distribution of data for a survey variable.

### E

*editing*

The process of examining questionnaires or data against some set of criteria, to be sure the content is correct or appropriate.

*executive summary*

The first section of the final report, including only an outline of the major highlights of the results.

*expected cell frequency*

A value computed during or after cross-tabulation, based on the proportion of the data represented by the entire row and column on which the cell resides.

*explicit scale*

Any scale that is directly expressed or stated, either verbally or visually, as opposed to those that are only implied by the question.

*extreme case*

A response that is an outlier, with a value so extreme it is far distant from any other response, which may sometimes suggest there has been an error while recording or transferring the data.

*extremity*

The extreme or terminal point, limit, or part of something, such as the upper or lower limit of a scale or distribution of scores.

### F

*F-distribution *

The particular form of a distribution derived from a set of computations and defined by two numbers of “degrees of freedom,” often listed in statistical reference tables.

*F-ratio *

The ratio of a numerator and denominator value consisting of variance expressed as “mean squares,” or the “sum of squares” divided by the degrees of freedom for each, usually computed with analysis of variance and compared to the F-distribution in a statistical reference table to assess statistical significance.

*feeling component *

One of the three main components of attitudes, consisting of the evaluations and judgments of the topic by the individual holding the attitude.

*fixed sum scale *

A particular type of scale where the respondent is asked to list the number of times each of a set of alternatives occurred or apply, out of a given total, so that the sum of the values must equal the total.

*floor effects *

The truncation or “chopping off” of the low side of a distribution because respondents’ answers could go no lower on the scale.

*forced ranking scale *

A type of scale, yielding ordinal level data, where the respondents are instructed to rank a series of items in sequential order, with no “ties” or equal rankings allowed.

*formatting *

The design or a particular arrangement of words, numbers, or symbols, specifying their order or sequence, physical location, relative distance or proximity, and general form, often used in reference to the location of data fields within a file record.

*frequency distribution *

The number of cases that contain each of the scale values for a particular survey item or variable.*frequency table *

A tabular presentation of the frequency distribution, often including percentage distributions based on the frequencies and the sample size, the number of valid cases, and the cumulative number of valid cases.

### G

*goals*

Specific objectives or ends sought by respondents that may be a topic for survey research measurement within the broader topic category of “needs.”

*go errors*

An error that results when a decision maker goes ahead with a course of action and it proves to be costly or unsatisfactory.

*grand mean*

The arithmetic mean of the dependent variable for all of the cases in an analysis of variance, as opposed to the “group means,” including only the cases in each category of the independent variable.

*graphic data description*

Portrayal of data distributions for individual survey variables with charts, graphs, or data plots rather than numeric tables.

*graphic scale*

Any scale with scale points portrayed as pictures or diagrams, rather than numbers, letters, or words.

### H

*hand tabulation*

The statistical computations and analysis of survey data without the use of computer analysis routines, usually confined to frequency tables, cross-tabulation tables, and the statistics that can be obtained from them without the processing of continuous numeric variables with many scale values.

*happy-sad face scale*

One form of pictorial scale showing a series of simple sketches of faces with the mouth of each turned up in a smile to show pleasure or down to show displeasure in varying degrees, often used for surveying young children who may not be able to understand numeric or verbal scales.

*histogram*

A horizontal bar chart showing the frequency or percentage distribution of response for a survey item in graphic form, often generated by computer analysis routines if requested.

*homoskedasticity *

The required condition of a scatterplot of data for regression analysis, where the data points are spread around the regression line in approximately equal amounts at any given point on the line, forming an even corridor of data, as opposed to heteroskedasticity, such as a “funnel shaped” pattern of data round the regression line.

*hostility bias*

Opposition, resentment, or resistance to an interviewer, survey sponsor, response task, or situational factor that negatively affects answers to one or more specific survey questions.

*Hypothesis*

A conjectural statement about the value of some variable or the relationship between variables that will be tested and ultimately accepted or rejected on the basis of statistical analysis of survey results, most often used in formal scientific or academic research.

### I

*images*

The generalized or synthesized picture representation of some object, person, or idea held in the minds of people, based on partial information from previous experiences, perceptions, or evaluations, and often one of the major topics of survey research.

*implicit scale*

Any scale that need not be explicitly stated in the question or presented verbally or visually to respondents because they automatically understand how they are to respond, such as asking one’s age with the implicit understanding that it will be expressed in years since birth.

*independence*

The condition between two variables or measurements where information about one gives no indication of the likely value of the other because they are unrelated.

*independent variable*

The variable that is viewed as influencing, affecting, or determining the values of another variable when they are regarded as being in a potential cause-and-effect relationship.

*inferential statistics*

Any statistical measure that can be used to make inferences or generalizations about a population, with a known level of probability, based on the values or conditions of a sample.

*information needs*

The specific categories of information required by those sponsoring pragmatic survey research, in order to make decisions or choices or to set policy, or required by those conducting academic research to test a theoretical or conceptual hypothesis and enhance some body of knowledge or literature.

*instrumentation*

The survey questionnaire and other devices, such as cover letters, rating cards, and the like, used to obtain data from respondents.

*instrumentation bias*

The tendency for some aspect of the survey instruments to cause respondents to answer in a particular way or systematically “push” or “pull” the survey results in some given direction, thus reducing the survey validity.

*instrumentation error*

The tendency for some aspect of the survey instruments to randomly affect the data in such a way that they are not true representations of the respondent opinions or conditions, but there is no specific direction or systematic influence, so that survey reliability is reduced.

*integer *

A whole number, as opposed to a fractional or decimal value.

*interpretation error*

Error that results when interviewers are asked to interpret responses during the interview or make judgments about the responses.

*interrogation error*

Errors that occur when questions are expressed differently from one respondent to the next.

*interval scale*

Any scale where the intervals between scale points are equal, even though there may be no zero value or zero does not represent a complete absence of the thing measured, such as the Fahrenheit scale.

*interviewing bias*

The tendency for some aspect of the interviewing to cause respondents to answer in a particular way or systematically “push” or “pull” the survey results in some given direction, thus reducing the survey validity.

*interviewing error*

The tendency for some aspect of the interviewing to randomly affect the data so they don’t truly represent the respondents’ opinions or conditions, thus reducing survey reliability.

### J

*judgment sample*

A sample selected on the basis of the researcher’s judgment about what units or respondents should and should not be included, as opposed to random selection.

### K

*knowledge component*

One of the three main components of attitudes, consisting of the facts or beliefs the individual holds about the topic of the attitude.

*kurtosis*

A statistical measure of the shape of a distribution that indicates whether the curve is more peaked or more flat than a normal, bell-shaped curve and how much so.

### L

*level of confidence*

The specific probability of obtaining some result from a sample if it did not exist in the population as a whole, at or below which the relationship will be regarded as statistically significant.

*lifestyle*

The general pattern of daily behavior, activities, choices, and preferences for an individual or family that might be used to characterize them and distinguish them in meaningful ways from those following a different pattern.

*Likert scale*

A type of scaling where the respondents are presented with a series of statements, rather than questions, and asked to indicate the degree to which they agree or disagree, usually on a five-point scale.

*linear, numeric scale*

A scale used when items are to be judged on a single dimension and arrayed on a scale with equal intervals, providing both absolute measures of importance and relative measures, or rankings, if responses among the various items are compared.

### M

*mail data collection*

The mailing of questionnaires and their return by mail by the designated respondents.

*maximum*

The highest value for a variable that was actually obtained from a sample, often reported by analysis routines and used by analysts to assess range and likelihood of outliers or ceiling effects.

*mean*

The most common average or measure of central tendency, providing an indication of the most typical or representative value for the sample and the population as a whole, within a given confidence level.

*mean squares*

A value usually computed for analysis of variance to form an F-ratio to assess statistical significance, consisting of the total of the squared deviations from the mean for each data point, or sums of squares, divided by the number of cases or degrees of freedom.

*measures of spread*

Statistical indications of the dispersion of the data around the central point, such as the standard deviation.

*median*

An average or measure of central tendency, consisting of the value the middle case would take on if the cases were arrayed from lowest to highest value for the variable and the scale represented a continuum or could include an infinite number of points, used in preference to the mean for ordinal level data and often preferred to the mean for distributions that are highly skewed to one side or have outlying values.

*mental set*

The existing frame of mind, point of view, or train of thought adopted by a respondent at a given moment, used to judge a series of survey questions or items.

*midpoint*

The middle point on a scale with an odd number of scale points, sometimes reflecting neutrality on the spectrum of response.

*minimum*

The lowest value for a variable that was actually obtained from a sample, often reported by analysis routines and used by analysts to assess range and likelihood of outliers or floor effects.

*minimum expected cell frequency*

The lowest expected cell frequency in a cross-tabulation table, that must be at least five for valid use of the chi-square statistic to assess the significance of the relationship, computed by identifying the smallest row frequency and column frequency, multiplying the two, and dividing by the total frequency for the table.

*mode*

The only average appropriate to indicate the most typical case for a distribution of nominal data, consisting of the category with the highest frequency, and also representing the location of the peak or high point in a distribution of continuous numeric data with many scale values.

*motives *

The impetus or urge causing a person to take some action, that may be a topic for survey research measurement within the broader topic category of “needs.”

*multiple rating list*

A survey item format used to save space and response time, designed so many items can be rated on the same scale.

*multiple rating matrix*

A survey item format used to save space and response time, designed so the same scale is used to rate multiple items on several dimensions.

*multiple regression*

Linear regression that uses a single dependent variable and two or more independent variables in the same analysis, in contrast to simple, linear regression using only one independent variable, so that both the effect of each independent variable and the effects of interactions among independent variables can be gauged.

*multiple response item*

A structured, multiplechoice survey question that allows the respondent to choose as many response categories as apply.

*multivariate analysis*

Statistical analysis techniques to assess the relationships or patterns among more than two variables simultaneously, including such methods as multiple regression, factorial analysis of variance, analysis of covariance, factor analysis, cluster analysis, multidimensional scaling, and the like.

*mutually exclusive categories*

Response categories defined to ensure a unique association between any given answer and only one category or alternative, so no response can fit into two or more categories.

### N

*namelist*

A listing of names and addresses, often used for mail surveys, that may be accumulated or acquired from one of many namelist brokerage firms who accumulate and manage such lists for people with particular characteristics or in certain locations.

*nay-sayer*

An individual or respondent who persistently tends to respond in the negative more often than others, regardless of the question.

*nay-sayer bias*

The tendency for a set of survey results to be generally and artificially negative on a series of items because all items are inclined in the same direction, toward the positive or toward the negative, and negative responses to the earlier items were generalized to the remaining ones, thus reducing the validity.

*need*

A persistent or fundamental requirement of the individual in order to maintain physical, psychological, or social well-being, usually fluctuating over time in its degree of satisfaction, and often a topic for survey research measurement.

*no-go errors*

Errors that result when a decision maker either fails to take some action that would have positive results, or ignores an alternative that would be more positive, choosing some less positive course.

*nominal scale*

A scale that uses numbers, letters, or symbols only as the names of independent categories, so that the scale values do not stand in any ordered relationship to one another.

*nonprobability sampling*

A nonrandom sampling design such as convenience sampling, where the probability of selecting a given sampling unit from the population is neither known nor equal to the probability of selecting any other unit.

*nonrespondents*

Those in the population who were included in the sample but failed to respond because they refused or could not be reached, or for some other reason.

*nonresponse bias*

A systematic effect on the data reducing validity that results when those with one type of opinion or condition fail to respond to a survey more often than do others with different opinions or conditions.

*normal curve*

Any distribution that conforms exactly or very closely to a normal distribution.

*normal distribution*

A continuous, symmetrical distribution that forms a curve with a particular shape defined by a mathematical equation, often referred to as a “bell-shaped” curve, valuable as a statistical reference because the precise areas under the curve can be computed or obtained from statistical tables.

*N-size*

A commonly used term for the sample size or the number of cases included in an analysis or tabular report.

*nth name sampling*

A sampling design where the number of units in the sample frame is first divided by the desired sample size to obtain the value of n, a value between one and n is randomly selected as a starting point or first case to be selected, and then every nth name or unit is selected, yielding a random sample.

*null hypothesis*

The hypothesis stipulating there will be no significant relationship between two variables, which can be tested with survey or other data and rejected in favor of the alternative hypothesis if the relationship proves significant, and most often used in scientific or academic research.

*numeric item*

Any survey item with scale numbers that are meaningful and stand in an ordered relationship to one another, such as those from ordinal, interval, or ratio scales.

### O

*online focus groups*

Participants sign in at a private chatroom on the Internet where they either join in a focused discussion with a moderator or read messages on a bulletin board and post their observations and opinions over a period of several days or weeks.

*open-ended question*

An unstructured survey question that does not include a list of alternative answers, so that respondents must answer in their own words.

*order bias*

The tendency for the order in which survey items are listed to affect respondents’ answers in some systematic way, reducing validity.

*ordinal scale*

A particular type of scale where the response alternatives define an ordered sequence, so the first is less than the second, the second less than the third, and so on, yielding ordinal level data where the intervals between scale points are not known or necessarily equal.

*outlier*

An extreme case or data point that stands well above or well below its nearest neighbor and is highly atypical of the distribution as a whole.

### P

*paired comparison scale*

A type of scale that presents respondents with one pair of alternatives at a time, instructing them to pick just one from each pair, yielding dichotomous, nominal data.

*paired t-test*

A technique for assessing the statistical significance of differences in mean values when both are obtained from the same respondents, and are therefore paired with one another.

*panel data collection*

A survey of a group of preselected respondents who agreed to be panel members on a continuous basis for a given period of time and provided initial demographic data, allowing selection of special groups and permitting the use of surveys to monitor responses over time.

*parameter*

A coefficient or value for the population that corresponds to a particular statistic from a sample and is often inferred from the sample.

*percentage distribution table*

A table listing the percentage of respondents selecting each response category or scale point.

*percentile*

An indication of the position of a case or value within a distribution, based on the number of cases with a lesser value out of a total of 100 cases.

*peripheral devices*

Units of computer hardware, other than the central processing unit, used to input, store, and output data or information.

*personal interview*

Data collection accomplished with the interviewer in the presence of the respondent, so that they have visual contact, as opposed to telephone interviewing.

*pictorial scale*

Any scale with scale points portrayed as pictures or diagrams rather than numbers, letters, or words.

*pie chart*

A method for portraying results graphically, consisting of a circle divided by lines from the center to the perimeter, so that the angles between the lines and therefore the size of the “pieces” represent proportions.

*pilot survey*

A brief preliminary survey, often using a small, convenience sample, conducted to test the survey instruments and data collection method before the project details are finalized and the larger, formal survey conducted.

*population *

The definition of all those people or elements of interest to the information seekers and from among whom the sample will be selected.

*population parameters*

Values or coefficients such as the mean or variance that describe the distribution of a variable in the population, often estimated or inferred based on the corresponding values of sample statistics.

*postcoding*

The process of examining completed survey questionnaires, choosing response categories for items not precoded, assigning code values to them on the documents, and recording codes and category labels in a codelist.

*precision*

The range of the confidence interval at a given level of probability, expressed in absolute terms or as a percentage of the mean value.

*precode*

Assigning code values to the categories of structured questions and listing them for printing on the questionnaire prior to data collection.

*preferences*

Predetermined choices by respondents from among alternative goods, that may be a topic for survey research measurement within the broader topic category of “needs.”

*prestige*

A condition of superior status, rank, or distinction relative to one’s peers or society in general, constituting a basic human need, sometimes causing respondents to react to questions in ways they perceive to be more prestigious.

*pretest*

Preliminary trial of some or all aspects of the sampling design, survey instrumentation, and data collection method, to be sure there are no unanticipated difficulties or problems.

*primary data*

Data collected for a particular project to meet specific information needs, as opposed to data that already exist for general use or as the result of inquiries for other purposes.

*probability sampling*

Any sampling design where every element in the population has either an equal probability of selection, as with random sampling, or a given probability of being selected that is known in advance and used in analysis to assess significance.

*process editing*

Examining survey data with computer processing routines to be sure the data conform to the data file format and that all values are expressed in the proper form and are within the range of the scale for each item.

*product-moment correlation*

The statistical method of correlation that requires interval or ratio level data and is not appropriate for ordinal scale data, which require rank correlation.

### Q

*qualification*

The process of inspecting or interrogating potential respondents to be sure they are qualified to respond or that they fit the quota specifications for a particular interviewer.

*qualitative research*

Research obtaining data in the form of words or other indications that do not lend themselves to quantitative analysis and whose analysis and interpretation depend on subjective judgments by experts.

*quantitative research*

Research obtaining data in a form that can be represented by numbers, so that quantities and magnitudes can be measured, assessed, and interpreted with the use of mathematical or statistical manipulation.

*questionnaire*

The basic survey instrument containing instructions, questions, or items, response alternatives where appropriate, and specific means for recording responses.

*quota*

A set number or proportion of respondents with given characteristics or attributes sought in a sample or assigned to specific interviewers or field-workers.

*quota sample*

Any sampling design that requires a set number or proportion of respondents with given characteristics or attributes.

*quota specification*

The listing of quota requirements for the entire sample or for specific interviewers, including identification of the characteristics that define the quota, the manner in which they are to be ascertained, the method of qualification of respondents, and the number or proportion of respondents who are to have each attribute or combination of attributes.

### R

*random digit dialing*

A sampling system for telephone surveys where all telephone numbers in households or all that have one of a given set of three-digit telephone number prefixes are regarded as the sample frame, and seven-digit or four-digit numbers are generated and dialed manually or automatically to obtain the sample.

*random error*

The result of extraneous factors, such as sampling error, affecting the survey results in no systematic pattern, so the answers are not consistently pushed or pulled in one specific direction.

*random sampling*

A sampling design that seeks to select respondents from the population or sample frame in a completely random fashion, so every respondent has an equal probability of being selected.

*range*

A measure of the spread in the distribution of data for a variable, defined as the maximum minus the minimum, plus one.

*rank correlation*

The statistical method of correlation appropriate when one or both of the variables are from only ordinal level scales, sometimes called Spearman rank correlation.

*rank order scale*

A scale essentially the same as the forced ranking scale.

*rating card*

A card or sheet containing a rating scale that is handed to or shown to respondents during personal interviews and from which they pick their response alternatives by number or letter.

*rating scale*

Any scale from which respondents choose values that represent their responses, ordinarily yielding interval or ratio level data.

*ratio scale*

Any scale that has the same characteristics as an equal interval scale, plus the fact that zero represents the complete absence of the thing being measured, so that a ratio of one scale value to another has meaningful and legitimate interpretation.

*raw data*

Data that has not been transformed or processed, although it may have been edited and transferred from one medium to another.

*recode*

The process of systematically assigning new code values to variables, based on the original values, usually done in order to group data into larger categories to obtain fewer code values.

*record format*

The specification of where the data field for each variable is to be keyed or recorded in a data file, including both the column(s) and the record numbers within a single case.

*recording error*

Error that may occur when interviewers are required to write down verbal answers by respondents, typically caused by unsatisfactory abbreviation when there is insufficient time to record entire verbatim answers.

*regression analysis (simple linear regression)*

A statistical measure of the effect of one interval or ratio level variable on another, used both to indicate the statistical significance of the relationship and to generate an equation to predict or estimate the value of the dependent variable for a new case, based only on the known value of the independent variable.

*regression equation*

The equation generated by linear regression analysis, expressed as a coefficient that can be multiplied by the value of the independent variable for a new case and a constant to be added to predict the unknown value of the dependent variable.

*relative frequency*

A term that is sometimes used to refer to the percentages listed in a frequency table, indicating the proportion of the sample in each category.

*reliability*

The degree to which the survey results are free from random error, as opposed to systematic bias, often expressed in terms of confidence intervals or confidence levels.

*report generation*

The process of arranging and condensing tabular survey results and expressing the written interpretations of the findings to provide information to those seeking it.

*responding sample*

The number of cases with valid responses to the survey or to an individual survey item, as opposed to the total sample size.

*response bias*

The tendency for some aspect of the response task, such as annoyance or a desire to please the interviewer, to cause respondents to answer in a particular way or systematically “push” or “pull” the survey results in some given direction, thus reducing the survey validity.

*response error*

The tendency for some aspect of the response task, such as boredom, inattention, or fatigue, to randomly affect the data in such a way that they are not true representations of the respondent opinions or conditions, but there is no specific direction or systematic influence, so that survey reliability is reduced.

*response option error*

Error resulting because interviewers read the alternative answers to respondents when they shouldn’t do so or fail to read them when they should.

*response rate*

The percentage of those included in the sample who responded to the survey and provided usable, completed questionnaires.

*r-square (r2, Rsq)*

The coefficient of determination obtained during regression analysis, indicating the proportion of variance in the dependent variable that is “explained” by the values of the independent variable.

*runs test*

A statistical process used in connection with regression analysis to determine the probability that the data are actually linear or arrayed evenly around a straight line if the data were plotted, by counting the “runs” of successive data points that are all on one side of the regression line.

### S

*sample*

The number and/or identification of respondents in the population who will be or have been included in the survey.

*sample frame*

A listing that should include all those in the population to be sampled and exclude all those who are not in the population.

*sample selection bias*

Any form of bias resulting from the selection of respondents in a manner that deviates from random selection, so that some types of respondents are over- or underrepresented in the sample.

*sample unit*

The smallest unit of the sample to be surveyed or the unit that will constitute one case for analysis, ordinarily one respondent or questionnaire.

*sampling design*

The specification of the sample frame, sample size, and the system for selecting and contacting individual respondents from the population.

*sampling error*

The degree to which the results from the sample deviate from those that would be obtained from the entire population, because of random error in the selection of respondent and the corresponding reduction in reliability.

*scale interpretation error*

Error associated with the use of rating cards where respondents answer with the name of a category, rather than its number or code, and the interviewer records the wrong code value because the category names are not listed on the questionnaire.

*scatterplot *

A graphic plot of the data points for two variables, usually generated on request by analysis routines during regression analysis, so that each data point is plotted horizontally according to the value of the independent variable and vertically according to the value of the dependent variable.

*secondary data*

Data initially acquired for general use or for some purpose other than the information requirements of the project at hand.

*selection bias*

A systematic effect on the data resulting from the selection of respondents in a manner that deviates from random selection, so that some types of respondents are over- or underrepresented in the sample.

*self-selection bias*

A systematic affect on survey results because some respondents voluntarily participate while others decline or refuse, so that those with certain opinions or conditions are under- or overrepresented in the sample.

*semantic differential scale*

A scaling device that lists several pairs of bipolar adjectives, usually separated by a seven-point scale, and instructs respondents to rate the topic or object on each, ordinarily used to measure image and provide a profile.

*semantic distance scale*

A scale listing several adjectives or phrases, where respondents are instructed to indicate how well each describes some object or topic based on a linear, numeric scale, often used to measure image and obtain a profile in much the same manner as the semantic differential scale is used.

*sequential sample*

A sampling design that requires the collection of data in increments with a relatively small sample at each stage, so that analysis can be performed after each stage to determine when the sample is large enough to provide the required level of confidence or reliability.

*shape *

The form or outline of a data distribution arrayed from lowest to highest value, portrayed in a data plot or described by statistical values such as the coefficients of skewness and kurtosis.

*sight-edit*

The visual examination of the completed questionnaires immediately after data collection to determine if they are sufficiently complete and usable.

*significance level*

The probability that the magnitude of the relationship might result in a sample of that size purely from sampling error if, in fact, it did not exist in the population.

*simple random sample *

A sampling design that seeks to select respondents from the population or sample frame in a completely random fashion, so every respondent has an equal probability of being selected, and no clustering or stratification methods are used.

*single response item*

A structured, multiplechoice survey question that requires the respondent to choose only one response category, such as one that represents the “best” or “favorite.”

*skewness*

A designation of the shape of a distribution, indicating the degree of symmetry or the degree and direction that the mode or peak “leans” toward one side, with only a few values extending well out toward the tail on the other.

*slope*

In regression analysis, the “rise over the run” when the dependent variable is plotted on the vertical axis of a scatterplot, or the amount of increase or decrease in the units of the dependent variable for each unit of the independent variable, indicated by the regression coefficient.

*social desirability*

The tendency for respondents to give answers to survey questions that are consistent with what the society believes is right, proper, correct, or acceptable, creating bias in the results whenever the true answers are suppressed to meet social norms.

*spread*

The range and degree of dispersion or variance in the distribution of data for a survey variable, often described by the standard deviation of the distribution.

*stair-step scale*

One type of pictorial scale graphically showing the scale points as a series of steps, appropriate for use with young children or other respondents who might have difficulty understanding a numeric or verbal scale.

*standard deviation (S.D.)*

A computed measure of spread or dispersion in a distribution of data, based on the squared deviations of each point from the mean, that can be used to indicate the proportion of data within certain ranges of scale values when the distribution conforms closely to the normal curve.

*standard error (S.E.) of the estimate*

A computed value in regression analysis based on sample size and variance around the regression line, determining the confidence interval around a predicted value of the dependent variable at a given probability.

*standard error (S.E.) of the mean*

A computed value based on the size of the sample and the standard deviation of the distribution, indicating the range within which the mean of the population is likely to be from the mean of the sample at a given level probability.

*statistic*

Some value computed from sample data that may also be used to make inferences about the corresponding value or “parameter” for the whole population.

*statistical analysis*

The process of computation and manipulation of sample data in order to suppress the detail and make relevant facts and relationships more visible and meaningful, and to generate statistics in order to make inferences about the population as a whole.

*statistical inference*

The process of generalizing information from a sample to the population as a whole by estimating population parameters, based on their corresponding statistical values from the sample.

*statistical significance*

An explicit assumption by the analyst that a relationship revealed in the sample data also exists in the population as a whole, based on the relatively small probability that it would result only from sampling error if it did not exist in the population.

*stratified sampling*

A sampling design that divides the population into specific strata containing certain types of respondents, then selects subsamples of the required size for each strata.

*stratum*

The singular form of “strata,” indicating one level of a stratified sampling design.

*structured question*

Any question that lists or prescribes the response alternatives from which respondents must choose, such as multiple-choice or true/false questions and items accompanied by rating scales.

*subsample*

One part of an entire sample that is singled out for special attention or analysis, often defined in terms of a demographic characteristic.

*sum*

The total of a series of values or the process of adding them.

*sum of squares (S.S.)*

A value computed for several forms of statistical analysis, such as computing the standard deviation, analysis of variance, regression analysis, and the like, where some mean is subtracted from each data point, this deviation is squared, and the squared values are added for all the cases.

*survey*

A research technique where information requirements are specified, a population is identified, a sample selected and systematically questioned, and the results analyzed, generalized to the population, and reported to meet the information needs.

*systematic*

A relationship or effect that is not random, but rather one that is consistent or in a given “direction.”

*systematic bias*

A redundant term, since bias is defined as systematic effect, but commonly used to emphasize the nonrandom nature of a bias or to distinguish bias from random error.

*systematic sampling*

Another term for nth name sampling, where the number of units in the sample frame is first divided by the desired sample size to obtain the value of n, a value between one and n is randomly selected as a starting point or first case to be selected, and then every nth name or unit is selected, yielding a random sample.

*system hardware*

Any mechanical or electronic device linked in a computer system, including the central processing unit and “peripheral” devices such as printers and external disk drives.

### T

*t-distribution*

A symmetrical statistical probability distribution, slightly flatter and wider than the standard, normal distribution, often listed in statistical reference tables and used to determine the significance of paired t-tests.

*telephone interview*

Interview data collection using the telephone to contact respondents, as opposed to personal interviewing where respondents are in the presence of the interviewer and have visual contact.

*termination bias*

Bias resulting when respondents of a certain type or with a certain orientation terminate their participation in a continuing study, such as a panel study, at greater rates than others.

*threat *

A source of bias or resistance resulting when respondents find survey items or topics intimidating or threatening, such as questions about financial matters, or instructions, questions, or scales that are confusing, suggesting the respondent is ignorant or incompetent.

*t-test*

A statistical method of assessing the significance of differences between two mean values for the same variable, as opposed to a paired t-test of values for two different variables for the same cases, yielding the same basic information as an analysis of variance with only two categories for the independent variable.

*Type I error*

In academic or scientific research (as opposed to pragmatic research), the probability of rejecting the “null hypothesis” that no relationship exists, and, therefore, accepting the “alternative hypothesis” that there is a relationship, when, in fact, no relationship exists in the population as a whole.

*Type II error*

In academic or scientific research (as opposed to pragmatic research), the probability of not rejecting the “null hypothesis” that no relationship exists, and, therefore, rejecting the “alternative hypothesis” that there is a relationship, when, in fact, a relationship does exist in the population as a whole.

### U

*unaided recall*

A form of questioning respondents about what they remember, where the facts, objects, or events are not listed or presented to them to aid their recollection, as with aided recall.

*unbiased*

Free of bias or unaffected by any extraneous factor that would systematically affect the values or results.

*unbiased estimate*

A statistical term that indicates the value of a particular statistic, such as the mean, obtained from the sample, will be exactly equal to the corresponding value of the population parameter, on the average over an infinite number of such samples.

*unconditional branching*

Instructions that control the “path” of interrogation by directing all respondents who reach a specific place in the questionnaire, such as the end of a special section, to go to another location in the questionnaire, rather than merely continuing from that point.

*unimodal *

Having only one modal value for a distribution of categorical data or only one peak or mode for a continuous distribution.

*univariate analysis*

The statistical description or the analysis of just one variable at a time.

*unstructured question*

An “open-ended” survey question where the alternative answers are not listed and respondents must provide the answers in their own words.

### V

*validation*

A term commonly but incorrectly used by survey researchers and data collection agencies to indicate “verification” of responses.

*validity*

The degree to which the survey data or results are free from both systematic bias and random error.

*variable*

A measurement unit that can be taken on several different values, usually used to refer to the distribution of data for one survey item.

*variance*

A statistical term referring to the sum of the squared deviations of each data point from the mean (the sum of squares), divided by the number of cases or degrees of freedom (the mean squares), and also the value from which the standard deviation is computed by extracting the square root.

*verbal frequency scale*

A particular type of verbal scale where the frequency of an event to be indicated by the respondent is expressed verbally, ordinarily with the words “Always, Often, Sometimes, Rarely, and Never,” rather than in numeric quantities.

*verbal scale*

Any scale whose points are expressed in words or whose numeric code values are labeled throughout the scale with words.

*verification*

The process of checking with respondents after they have been interviewed to be sure the person was actually interviewed and that the interview was done correctly and completely when and where it was supposed to be, and commonly but incorrectly called “validation.”

*video-streaming*

A new form of digital communication sometimes used for focus group recording and reporting, in which the video and audio are simultaneously transmitted to a host server on the Internet. The sponsor’s personnel authorized to view the group are given the time schedule, sign on instructions, and password, so they can observe it from their own computer or conference room whenever they wish.

*visibility bias*

One form of selection bias, where a particular type of respondent is over- or underrepresented in the sample because they are more visible than others with different characteristics.

### W

### X

### Y

*yea-sayer*

An individual or respondent who persistently tends to respond in the affirmative more often than others, regardless of the questions.

*yea-sayer bias*

The tendency for a set of survey results to be generally and artificially positive on a series of items because all items are inclined in the same direction, toward the positive or toward the negative, and positive responses to the earlier items were generalized to the remaining ones, thus reducing the validity.

### Z

