On this page we discuss several topics: 1) probability versus non-probability sampling, 2)
modes of sexual orientation data collection
, and 3) sample size.  The full discussion of
constructing samples is beyond the scope of this website.  For more detailed information see
appropriate texts on sampling (such as the classic text
Applied Sampling by Seymour Sudman,
1976) or
contact us for guidance.  The topics discussed here were chosen because they are some of
the more common concerns that arise when sampling LGBs.

PROBABILITY VERSUS NON-PROBABILITY SAMPLING: There are two types of sampling
methods: probability sampling and non-probability sampling. The difference between them is
that in probability sampling, every unit has a "chance" of being selected, and that chance can be
largely quantified. This is not true for non-probability sampling; every item in a population does
not have an equal chance of being selected. Historically, samples of LGBs were non-probability
samples drawn from locations such as mental institutions, prisons, or bars.  Not surprisingly,
data from these samples were biased in ways that stigmatized LGBs and supported arguments
made by some that they were inherently "sick."  With the advent of probability samples, many
but not all of these myths have been dispelled.

Because probability sampling allows for the generalization of results to larger populations, this
website has focused on data sources that have used this method.  Probability sampling involves
the selection of a sample from a population, based on the principle of randomization or chance.
Probability sampling is more complex, more time-consuming and usually more costly than non-
probability sampling. However, because units from the population are randomly selected and
each unit's probability of inclusion can be calculated, reliable estimates can be produced along
with estimates of the sampling error, and inferences can be made about the population.

There are several different ways in which a probability sample can be selected. The method
chosen depends on a number of factors, such as the available sampling frame, how spread out the
population is, how costly it is to survey members of the population and how users will analyse the
data. When choosing a probability sample design, your goal should be to minimize the sampling
error of the estimates for the most important survey variables, while simultaneously minimizing
the time and cost of conducting the survey.  The following are the most common probability
sampling methods:
  • simple random sampling - In simple random sampling, each member of a population has
    an equal chance of being included in the sample.
  • systematic sampling - Sometimes called interval sampling, systematic sampling means
    that there is a gap, or interval, between each selected unit in the sample.
  • sampling with probability proportional to size - Probability sampling requires that each
    member of the survey population have a chance of being included in the sample, but it does
    not require that this chance be the same for everyone.
  • stratified sampling - Using stratified sampling, the population is divided into
    homogeneous, mutually exclusive groups called strata, and then independent samples are
    selected from each stratum.
  • cluster sampling - Cluster sampling divides the population into groups or clusters. A
    number of clusters are selected randomly to represent the total population, and then all
    units within selected clusters are included in the sample.
  • multi-stage sampling - Multi-stage sampling is like the cluster method, except that it
    involves picking a sample from within each chosen cluster, rather than including all units
    in the cluster.
  • multi-phase sampling - A multi-phase sample collects basic information from a large
    sample of units and then, for a subsample of these units, collects more detailed
    information.

For detailed descriptions of each of these see appropriate texts on sampling (such as the classic
text Applied Sampling by Seymour Sudman, 1976) or
contact us for guidance.  Also, across these
methods screeners can be used.  A screener is a tool to screen the sample for persons (units) of
interest.  For an example of a screener that was used to identify lesbians, gays and bisexuals see:
Kaiser Screener.

MODE OF SEXUAL ORIENTATION DATA COLLECTION:  As demonstrated in the surveys
described on this website, sexual orientation data has now been collected: 1) face-to-face, 2) over
the telephone, 3) using audio-CASI, 4) in mail surveys, 5) using self-completed questionnaires,
and 6) over the internet.  As each method was first attempted, there was understandably some
trepidation concerning whether it would work.  However, we now know that data can be
successfully collected using each of these methods.  That said, further research on the relative
benefits and limitations of each is needed.  For further information on the success of any of these
methods, please contact survey administrators that have used the methods, or
contact us.

SAMPLE SIZE:  The level of precision needed for survey estimates (such as estimates of the
prevalence of gays or lesbians in a population, or the prevalence of smoking among gays and
lesbians) will impact the sample size that one needs to draw.  Unfortunately, it is not as easy to
determine the sample size as one may think. Generally, the final sample size of a survey is a
compromise between the level of precision to be achieved, the survey budget and other operational
constraints, such as time.  In order to achieve a certain level of precision, the sample size depends,
among other things, on the following factors:
  • The variability of the characteristics being observed: If every person in a population had
    the same sexual orientation, then a sample of one person would be all you would need to
    estimate the average sexual orientation of the population. If the sexual orientations are
    very different, then you would need a bigger sample in order to produce a reliable estimate.
  • The population size: To a certain extent, the bigger the population, the bigger the sample
    needed. But once you reach a certain level, an increase in population no longer affects the
    sample size. For instance, the necessary sample size to achieve a certain level of precision
    will be about the same for a population of one million as for a population twice that size.
  • The sampling and estimation methods: Not all sampling and estimation methods have
    the same level of efficiency. You will need a bigger sample if your method is not the most
    efficient. But because of operational constraints and the unavailability of an adequate
    frame, you cannot always use the most efficient technique.

Estimating overall sample sizes in order to examine a topic of interest is always a challenge.  The
best guidance one can get is from the surveys that have already been conducted.  It is therefore to
your advantage when choosing a sample size to review
data sources that have already sampled
LGBs.
Up until the mid-1990s it was
widely believed that
representative samples of LGBs
were too difficult or even
impossible to draw.  It was
thought that people wouldn't
identify as lesbian, gay or
bisexual to researchers, or that
the populations were so rare
that it wasn't economically
feasible.  The surveys described
on this website have shown that
representative samples can be
drawn economically.

Today, sexual orientation data is
generally not collected either
because researchers and program
planners don't think to collect the
data (because it hasn't crossed their
mind or they don't know the
relevance to their work), or for
political reasons having nothing to
do with science or community needs.


GayData.org: Sampling