PandaSurvey package

Submodules

PandaSurvey.base module

class PandaSurvey.base.SurveyWeightBase(df, proportions, recodes={})[source]

Bases: object

Abstract base class for survey weighting method.

calc()[source]

Calculation step must be implemented here.

loss(weights)[source]

Describes the inflation in the variance of sample estimates that can be attributed to weighting. See Applied Survey Data Analysis (2010) by Heeringa et al. for more information.

Parameters:weights (numpy.array) – array of individual weights
recode(encoders)[source]

Recodes demographic information.

Parameters:encoders (dict) – Mapping of demographic keys to respective recoding function. Need only define those demographic dimensions that need to be recoded.

PandaSurvey.simple module

class PandaSurvey.simple.SimpleRake(df, proportions, epsilon=0.001, maxiter=1000)[source]

Bases: PandaSurvey.base.SurveyWeightBase

Rake weighting implementation.

Parameters:
  • df (pandas.DataFrame) – The observations to be weighted.
  • proportions (dict) – A dictionary of the target proportions for each demographic and response.
  • epsilon (float) – Convergence threshold for the raking procedure.
  • maxiter (int) – Maximum number of iterations for the raking procedure.
calc(use_l2=False)[source]

Calculates individual weights.

Parameters:use_l2 (boolean) – Determines if convergence is measured using the L2 norm of the change in weights. By default, the L1 norm is used.

Module contents

PandaSurvey includes two unique datasets for testing purpuses: People and a sample study. The People file is from the 2010 US Census. The sample study is from a small survey performed at InContext Solutions in 2014 (specific survey details withheld)

PandaSurvey.load_people()[source]

Returns the People dataset as a DataFrame. The data consists of 9999 individuals with age, disability status, marital status, race, and gender demographic information. Columns and their codes are described below:

  • Age
    • Non-negative integer
    • May include zeros
  • Disability
    • 1: Disabled
    • 2: Not disabled
  • MarritalStatus
    • 1: Married
    • 2: Widowed
    • 3: Divorced
    • 4: Separated
    • 5: Never married or under 15 years old
  • Race
    • 1: White alone
    • 2: Black or African American alone
    • 3: American Indian alone
    • 4: Alaska Native alone
    • 5: American Indian and Alaska Native tribes specified; or American Indian or Alaska native, not specified and no other races
    • 6: Asian alone
    • 7: Native Hawaiian and Other Pacific Islander alone
    • 8: Some other race alone
    • 9: Two or more major race groups
  • Gender
    • 1: Male
    • 2: Female
PandaSurvey.load_sample_proportions()[source]

Returns the target sample proportions that correspond to the sample survey.

Demographic Coded Value Target Proportion
Age 1 0.07
Age 2 0.22
Age 3 0.2
Age 4 0.2
Age 5 0.21
Gender 1 0.5
Gender 2 0.5
Income 1 0.17
Income 2 0.21
Income 3 0.25
Income 4 0.16
Income 5 0.11
Hispanic 1 0.09
Hispanic 2 0.91
Race 0 0.15
Race 1 0.85
PandaSurvey.load_sample_study()[source]

Returns a sample dataset describing demographics in coded format from 2092 respondents. The study consists of 7 cells and demographics considered include age, gender, income, hispanic, and race.

PandaSurvey.load_sample_weights()[source]

Returns individual weights from the sample survey calculated via a raking method previously implemented in R.