Overview of The Behavioral Risk Factor Surveillance System, 1997 Survey Data
BACKGROUND
The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project of the
Centers for Disease Control and Prevention (CDC), and U.S. states and territories. The BRFSS,
administered and supported by the Behavioral Surveillance Branch (BSB) of the CDC, is an
on-going data collection program designed to measure behavioral risk factors in the adult
population 18 years of age or over living in households. The BRFSS was initiated in 1984, with
15 states collecting surveillance data on risk behaviors through monthly telephone interviews.
The number of states participating in the survey increased, so that by 1997, 50 States, the District
of Columbia, Puerto Rico, Guam, and the Virgin Islands were participating in the BRFSS.
In this document, the term "state" is used to refer to all areas participating in the surveillance
system, including the District of Columbia and the Commonwealth of Puerto Rico.
The objective of the BRFSS is to collect uniform, state-specific data on preventive health
practices and risk behaviors that are linked to chronic diseases, injuries, and preventable
infectious diseases in the adult population. Factors assessed by the BRFSS include tobacco use,
physical activity, dietary practices, safety-belt use, and use of cancer screening services, among
others. Data are collected from a random sample of adults (one per household) through a
telephone survey.
Field operations for the BRFSS are managed by the health departments under guidelines
provided by the BSB. These health departments participate in the development of the survey
instrument and conduct the interviews either in-house or through use of contractors. The data are
transmitted to the National Center for Chronic Disease Prevention and Health Promotion's
Behavioral Surveillance Branch at CDC for editing, processing, weighting, and analysis. An
edited and weighted data file is provided to each participating health department for each year of
data collection, and summary reports of state-specific data are prepared by the staff of the BSB.
Health departments use the data for a variety of purposes, including identification of
demographic variations in health-related behaviors, targeting services, addressing emergent and
critical health issues, proposing legislation for health initiatives and measuring progress toward
state and national health objectives.
The health characteristics estimated from the BRFSS pertain only to the adult population age 18
years and older living in households. As noted above, respondents are identified through
telephone-based methods. Although 95 percent of U.S. households have telephones, coverage
ranges from 87-98 percent across states and varies for subgroups as well. For example, persons
living in the South, minorities, and those in lower socioeconomic groups typically have lower
telephone coverage (see also Telephone Status by State in Table3.txt). No direct method of
compensating for non-telephone coverage is employed by the BRFSS; however,
post-stratification weights are used, and may partially correct for any bias caused by
non-telephone coverage. These weights adjust for differences in probability of selection and
nonresponse, as well as noncoverage, and must be used for deriving representative
population-based estimates of risk behavior prevalences.
2. DESIGN OF THE BRFSS
A. The BRFSS Questionnaire
The questionnaire has three parts: 1) the core component, consisting of the fixed, rotating, and
emerging core; 2) optional modules; and 3) state-added questions.
Core component. The fixed core is a standard set of questions asked by all states. It includes
queries about current health-related perceptions, conditions, and behaviors (e.g., health status,
health insurance, diabetes, tobacco use, selected cancer screening procedures, and HIV/AIDS
risks) and questions on demographic characteristics. The rotating core consists of two distinct
sets of questions, each asked in alternating years by all states, addressing different topics. In
1997, the rotating core items covered cholesterol, hypertension, injury, immunization, colorectal
screening and alcohol use. The emerging core is a set of up to five questions that are added to the
fixed and rotating cores. Emerging core questions typically focus on issues of a "late breaking"
nature and do not necessarily receive the same scrutiny that other questions receive prior to being
added to the instrument. These questions are part of the core for one year and are evaluated
during or soon after the year concludes to determine their potential value in future surveys.
Emerging questions for 1997 focused on health care coverage.
Optional CDC modules. These are sets of questions on specific topics (e.g., smokeless tobacco,
firearms) that states elect to use on their questionnaires. In 1997, 16 modules were supported by
CDC, including 3 that were rotating core topics not used in the 1997 core component. ( See 1997
BRFSS Modules Used By States in Table297.txt)
State-added questions. These are questions developed or acquired by participating states and
added to their questionnaires. State-added questions are not edited or evaluated by CDC.
Each year, the states and CDC agree on the content of the core component and optional modules.
For comparability, many questions are taken from established national surveys, such as the
National Health Interview Survey or the National Health and Nutrition Examination Survey.
This practice allows the BRFSS to take advantage of questions that may have been tested and
allows states to compare their data with those from other surveys. Any new questions proposed
as additions to the BRFSS must go through cognitive testing prior to their inclusion on the
survey. BRFSS protocol specifies that all states ask the core component questions without
modification; they may choose to add any, all, or none of the optional modules; and states may
add question(s) of their choosing at the end of the questionnaire.
Although CDC supported 16 modules in 1997, it is not feasible for a state to use them all. States
are selective with their choices of modules and state-specific questions to keep the questionnaire
at a reasonable length (though there is wide variation across states in the total number of
questions for a given year, ranging from a low of about 90 to 150 or more). New questionnaires
are implemented in January, and usually remain unchanged throughout the year. However, the
flexibility of state-added questions does permit additions, changes, and deletions at any time
during the year.
Annual Questionnaire Development
Before the beginning of the calendar year, CDC provides states with the text of the core
component and the optional modules that will be supported for the coming year. States select
their optional modules and choose any state-added question(s). Each state then constructs its
questionnaire. The core component is asked first, optional modules are asked next, and
state-added questions last. This ordering ensures comparability across states and follows CDC
protocol. Generally, the only changes allowed are the limited insertion of state-added questions
on topics related to core questions. Such exceptions are to be agreed upon in consultation with
CDC. However, even with these exceptions, the policy has not been followed in every instance.
Deviations from policy are noted in the comparability of data section of this document.
Once the content (core, modules, and state-added) of the questionnaire is determined by a state,
a paper version of the instrument is constructed and a copy sent to CDC. For states with
Computer Assisted Telephone Interview (CATI) systems, this copy is used for CATI
programming and general reference. For states with manual systems, the paper copy is used as a
prototype for a camera-ready master. The questionnaire is used without changes for one
calendar year. The topics included on the 1997 questionnaire are shown in BRFSS
Questionnaire Content Plan. If a significant portion of the state population does not speak
English, states have the option of translating the questionnaire into other languages. At the
present time,CDC only provides a Spanish version of the core questionnaire and optional
modules.
Each year, CDC provides Ci3 CATI programming, PC-EDITS programming, and state-specific
annual data tables for questions on the core and selected modules.
B. Sample description
The BRFSS standard for participating area sample designs is that sample records must be
justifiable as a probability sample of all households with telephones in the state. All
participating areas except Alaska, California, Hawaii, Nevada, and Texas met this criterion in
1997.
The sample designs used by states in the 1997 BRFSS survey are shown in Table497.txt. Thirty
areas used the Mitofsky-Waksberg design. Fifteen states used a disproportionate stratified
sample (DSS) design. Two states used a simple random sample design. The remaining five
states--Alaska, California, Hawaii, Nevada, and Texas--used a variety of other designs.
The Mitofsky-Waksberg sample design is a two-stage cluster design. In the first stage, telephone
numbers are grouped into sets of 100, called primary sampling units (PSU's) or clusters. These
sets consist of telephone numbers with identical area codes, prefixes, first two digits of the
suffixes, and all possible combinations of the last two digits of the suffixes. Clusters are sampled
randomly and, within each selected cluster, a single telephone number is randomly selected to be
dialed. If this telephone number is a household, the entire PSU is selected for further sampling;
if the telephone number is not a household the entire PSU is rejected. Telephone numbers from
accepted PSU's are dialed until a target number (for the BRFSS, the target is three) of completed
interviews is obtained. The BRFSS implementation includes a third sampling stage, wherein an
adult eighteen years of age or older is randomly selected from eligible households.
In a DSS design as most commonly practiced in the BRFSS, telephone numbers are divided into
two groups, or strata, which are sampled separately. One group, the high-density stratum,
contains telephone numbers which are expected to contain a large proportion of households. The
other group, the low-density stratum, contains telephone numbers which are expected to contain
a small proportion of households. Whether a telephone number goes into the high-density or
low-density stratum is determined by the number of listed household numbers in its hundred
block. A hundred block is a set of one hundred telephone numbers with the same area code,
prefix, and first two digits of the suffix and all possible combinations of the last two digits.
Numbers that come from hundred blocks with one or more listed household numbers (1+ blocks,
or banks) are put in the high density stratum. Numbers that come from hundred blocks with no
listed household numbers (0 blocks, or banks) are put in the low density stratum. Both strata are
sampled to obtain a probability sample of all households with telephones. The high density
stratum is sampled at a higher rate than the low density stratum (that is, disproportionately) to
obtain a sample that contains a larger proportion of household numbers than would be the case if
all numbers were sampled at the same rate.
Wisconsin, Michigan, and Oregon used variations of the DSS design described above.
Wisconsin assigned telephone numbers to high and low density strata based on previous
experience with their prefixes. Michigan defined its high density stratum as 2+ blocks and its
low density stratum as 1- blocks. Oregon used two different sample designs in 1997. In January,
Oregon used a DSS design which treated the entire state as a single geographic stratum. In
February through December, Oregon used a design which treated Multnomah County as one
geographic stratum and the rest of the state as a second geographic stratum. To represent this
difference, January data records were assigned to geographic stratum three and February through
December records were assigned to strata one or two, as appropriate.
The sample designs for Alaska, California, Hawaii, Nevada, and Texas, which, as noted above,
do not conform to the BRFSS standard for state sample designs, are described in the section
on comparability found in compar97.txt.
In most cases, each state constitutes a single stratum. In order to provide adequate sample sizes
for smaller geographically defined populations of interest, however, fourteen states sampled
disproportionately from strata defined to correspond to sub-state regions. The states with
disproportionately sampled geographic strata are Alaska, Arizona, Delaware, Hawaii, Idaho,
Maryland, Missouri, Nebraska, Nevada, Ohio, Oregon, Utah, Virginia, and Wisconsin.
Data for a state may be collected directly by the state health department or through a contractor.
Twenty-four state health departments collected their data in-house; twenty-eight contracted out
the data collection to a variety of governmental agencies, university survey research centers, and
commercial firms in 1997.
In 1997, the Behavioral Surveillance Branch provided sample for thirty-one states. Twenty-one
states purchased their sample from a commercial sample provider: twelve from Genesys
Sampling Systems, four from Survey Sampling, Inc., and five from a variety of other sources.
3. DATA COLLECTION
A. Interviewing Procedures
Interviews for 1997 were conducted through computer-assisted telephone interviewing (CATI)
by 49 areas; paper questionnaires were used in the other three surveillance areas. CDC supports
CATI programming using the Ci3 CATI software package. This support includes programming
Ci3 for states and supporting a Ci3 consultant who is available to assist states. Following
specifications provided by CDC, state health personnel or contractors conducted interviews. The
core portion of the questionnaire lasts an average of 10 minutes. Interview time for modules and
state-added questions is dependent upon the number of questions used, but generally extend the
interview period by an additional 5 to 10 minutes.
Interviewer retention is very high among states that conduct the survey in-house. The state
Coordinator or interviewer supervisor usually conducts the training using materials developed by
CDC covering seven basic areas: overview of the BRFSS, role descriptions for staff involved in
the interviewing process, the questionnaire, sampling, codes and dispositions, survey follow-up
and practice sessions. Contractors typically use interviewers who have experience conducting
telephone surveys, but these interviewers are given additional training on the BRFSS
questionnaire and procedures before they are "certified" to work on BRFSS. Further specifics on
interviewer training and procedures can be found in the BRFSS Operations Manual, 1998
available at the internet web site www.cdc.gov/nccdphp/brfss.
Monitoring of interviewers is expected. In 1997, twenty-two of the 52 surveillance projects
monitored interviews by "eavesdropping" on the interviewer without being able to hear the
respondent. Thirty states had systems connected to the telephones that enable monitoring of the
respondent and interviewer. Verification call-backs were also used by some states in lieu of
direct monitoring. Contractors typically conducted systematic monitoring by monitoring each
interviewer a certain amount of time each month. All states had the capability to tabulate
disposition code frequencies by interviewer. These data were the primary means for quantifying
interviewer performance. All states were required to do verification callbacks for a sample of
completed interviews as part of their quality control practices.
Telephone interviewing was conducted during a two-week period each month, and calls were
made 7 days per week, during both day and evening hours. Standard procedures in the
interviewing were followed for rotation of calls over days of the week and times of the day.
BRFSS procedural rules are contained in the BRFSS Operations Manual, 1998.
The median response rate (Upper Bound) for 1997 was 76.5 %, but ranged from 45.6% to
92.7%. Detailed information on interview response rates and item non-response rates are
discussed in the Summary Quality Control Report available on the internet web site:
www.cdc.gov/nccdphp/brfss.
4. DATA PROCESSING
A. Preparing for collection and processing data collected
Data processing is an integral part of any survey. Because data are collected and sent to BSB
during each month of the year, there are routine data processing tasks that need attention at all
times during the year. In addition, there are tasks that need to be conducted at different points in
the annual BRFSS cycle.
The preparation for the survey involves a number of steps that take place once the new
questionnaire is finalized. This includes developing complete edit specifications, programming
the Ci3 CATI software, programming the PC-EDITS software, and providing sample phone
numbers for states that require them. A Ci3-CATI data entry module for each state that uses this
software is produced. Skip patterns, together with some consistency edits, and response-code
range checks are incorporated into the CATI system. Incorporation of edits and skip patterns
into the CATI instrument reduces interviewer errors, data entry errors, and skip errors. A
program designed to read the data file from the survey data entry module and call information
from the sample tracking module in Ci3 is written to combine information into the final format
specified for the data year. For those state that do not use CATI software to perform the BRFSS
interviews, CDC prepares and provides EPI-Info data entry formats. CDC also creates and
distributes a DOS program that can perform data validations on properly formatted survey results
files. This program is used to output lists of errors or warning conditions encountered in the
data. A program which produces tables of response rates and other interviewing statistics is
included with the edits program sent to the states.
CDC begins to process data for the data year beginning in February of that year, and continues
processing data through the receipt of the final data for that year. CDC receives and tracks
monthly data submissions from the states. Once data are received from the state, editing
programs are run against the data. Any problems in the file are noted, and a BSB programmer
works with the state until the problems are resolved, or agreement is reached that no resolution
is possible. Response rate quality control reports are produced and shared with the Project
Officers, who review the reports and discuss any potential problems with the state.
After all of the data for a state are received and validated for the data year, three year-end
programs are run on the data to consolidate the monthly transmissions into a single data file for
the year, perform some additional, limited data cleanup and fixes specific to the state and data
year, and produce reports that identify potential analytic problems with the data set. Once these
programs are complete, the data are ready for assigning weights and adding new variables.
Not all of the variables that appear on the public use data tape are taken directly from the state
files. CDC prepares a set of SAS programs that implement the end of year data processing.
These programs not only prepare the data for analysis, but perform weighting and risk factor
calculations which are added as variables to the data file. The following variables are examples
of ones that result from this procedure, and are created for the user's convenience: _RFSMOK2,
RACE, _AGEG, _BMI, _TOTINDX. To create these variables, several variables from the data
file are combined. Creation of the variables vary in complexity; some only combine codes,
while others require sorting and combining selected codes from multiple variables.
Almost every variable derived from the BRFSS interview has a code category labeled "refused"
and generally given a "9", "99" or "999" value. Typically, the category consists of noninterviews
and persons for whom the question was not applicable because of a previous response or a
personal characteristics (i.e., age). However, this code may capture some responses that were
supposed to be answered, but for some reason were not, and appeared as a blank or other symbol.
The combination of these types of responses into a single code require vigilance on the part of
data file users who wish to separate respondents who were skipped out of a question from those
who were asked, but whose answer was unknown or who refused to answer a particular question.
B. Weighting the data
When data are used without weights, each record counts the same as any other record. Implicit
in such use are the assumptions that each record has an equal probability of selection and that
non-coverage and non-response are equal among all segments of the population. When
deviations from these assumptions are large enough to affect the results obtained from a data set,
then weighting each record appropriately can help to adjust for violations of the assumptions. An
additional, but conceptually unrelated, reason for weighting is to make the total number of cases
equal to some desired number. In the BRFSS, post-stratification serves as a blanket adjustment
for non-coverage and non-response and forces the total number of cases to equal population
estimates for each geographic stratum, which for the BRFSS is usually a state.
Following is a general formula that reflects all the factors taken into account in weighting the
1997 BRFSS data. Where a factor does not apply its value is set to one.
FINALWT=GEOWT*DENWT*(1/NPH)*NAD*CSA*POSTSTRAT
FINALWT is the final weight assigned to each record.
GEOWT is the inverse of the ratio of the estimated sampling fraction of each area code/prefix
combination subset to the area code/prefix combination subset with the largest estimated
sampling fraction. It weights for the unequal probability of selection by area code/prefix
combinations intended to cover specified geographic regions. Almost always, the regions are
discrete subsets of counties and the boundaries of the area code/prefix combinations do not
correspond exactly to the boundaries of the specified geographic regions.
DENWT is the inverse of the ratio of the sampling fraction of each subset of hundred blocks
(sets of telephone numbers with identical first eight digits and all possible final two digits)
sampled at a given rate to the hundred-block subset with the largest sampling fraction. It weights
for the unequal probability of selection by presumed household density of hundred block. This is
generally used in a design in which telephone numbers from hundred blocks with one or more
listed residential numbers (one-plus blocks) are sampled at a higher rate than telephone numbers
from hundred blocks with no listed residential numbers (zero blocks).
1/NPH is the inverse of the number of residential telephone numbers in the respondent's
household.
NAD is the number of adults in the respondent's household.
CSA is the ratio of the expected cluster size to the actual cluster size.
POSTSTRAT is the number of people in an age-by-sex or age-by-race-by-sex category in the
population of a region or a state divided by the sum of the preceding weights for the respondents
in that same age-by-sex or age-by-race-by-sex category. It adjusts for non-coverage and
non-response and forces the sum of the weighted frequencies to equal population estimates for
the region or state.