Sample size

marty · April 22, 2013, 7:45am

Hi,

This is perhaps a little off-topic, but am about to deploy ODK for a survey
of households in a city in Nigeria where we are trying to establish a broad
baseline of data on solid waste management practices. We are quite lucky in
that a fairly comprehensive land-use survey was carried out 2 years ago
which mapped all (known) residential compounds in the city; and a random
sample household survey carried out at the same time. I am trying to
identify how many households I need to survey to achieve a reasonable level
of statistical confidence (99% at 5 or 10% confidence).

We will focus on just one district of the city for our survey (approx 4000
residential compounds) and would look to sample from these known locations
at random, and then interview every household within the sampled compounds.
Our analysis will take place at the household level, although the random
sample will be drawn from the list of compounds* - a cluster sampling
approach. *We have no way of knowing exactly how many households we will
find in each compound and it would be difficult both on a practical and
political level to only interview selected households at each location.

I'm struggling to find a simple explanation of how to calculate my sample
size (bearing in mind that there will inevitably be an uneven number of
households in each compound so may have to correct sample size for
anticipated bias). I'm getting stuck on the fact that most sample size
equations/calculators require you to estimate a measurable effect in what
you intend to find - in this case though we are asking about 60 questions
on a range of topics and really trying to establish some baseline data as
opposed to proving/disproving a specific hypothesis....

Would anyone have any experience in this area and be willing to have a
conversation about it (perhaps offline)?!!

I am hoping to document our experiences using ODK post-survey, that's if I
ever get past this pesky sample size calculation!

Thanks in advance, and again sorry if this is a little off-topic.

Thanks

Marty

Jerry3904 · April 30, 2013, 9:23pm

Well I see no one else has jumped in, so I will throw out a starting point:

If you want a statistically valid sample, i.e., one from which you can
extrapolate to the population, then you can not AFAIK avoid the question of
variance. And if your 60 variables are truly equally important, then you
have to use the variable with the highest variance to determine sample
size. Most people avoid that...

Alternatively, you can construct estimated means and variances for your
variables with a small random sample of some appropriate number. You can
not use to draw conclusions about the population--though you can create
hypotheses and then test them rigorously with a statistically valid sample
size.

I suggest you talk to an actual statistician at a local university if
possible; there is often a mechanism for outside people to get advice on
this topic.

Best I can do...

···

On Monday, April 22, 2013 3:45:34 AM UTC-4, marty wrote: > > Hi, > > This is perhaps a little off-topic, but am about to deploy ODK for a > survey of households in a city in Nigeria where we are trying to establish > a broad baseline of data on solid waste management practices. We are quite > lucky in that a fairly comprehensive land-use survey was carried out 2 > years ago which mapped all (known) residential compounds in the city; and a > random sample household survey carried out at the same time. I am trying to > identify how many households I need to survey to achieve a reasonable level > of statistical confidence (99% at 5 or 10% confidence). > > We will focus on just one district of the city for our survey (approx 4000 > residential compounds) and would look to sample from these known locations > at random, and then interview every household within the sampled compounds. > Our analysis will take place at the household level, although the random > sample will be drawn from the list of *compound*s* - *a cluster sampling > approach*. *We have no way of knowing exactly how many households we will > find in each compound and it would be difficult both on a practical and > political level to only interview selected households at each location. > > I'm struggling to find a simple explanation of how to calculate my sample > size (bearing in mind that there will inevitably be an uneven number of > households in each compound so may have to correct sample size for > anticipated bias). I'm getting stuck on the fact that most sample size > equations/calculators require you to estimate a measurable effect in what > you intend to find - in this case though we are asking about 60 questions > on a range of topics and really trying to establish some baseline data as > opposed to proving/disproving a specific hypothesis.... > > Would anyone have any experience in this area and be willing to have a > conversation about it (perhaps offline)?!! > > I am hoping to document our experiences using ODK post-survey, that's if I > ever get past this pesky sample size calculation! > > Thanks in advance, and again sorry if this is a little off-topic. > > Thanks > > Marty >