/* Please do not attempt to run this program without reading the accompanying documentation*/ version 8.0 set more off prog drop _all /* File: clean f (income).do Date: May 2012 Desc: Clean-file to create Set F Note: See acknowledgments and copyright notice in new_master.do */ /* This program is to be run embedded within new_master.do. It cleans the variables necessary for Set F and generates a codebook. */ *********************************** /* The SIPP income definition includes three types of earnings: wages and salary, nonfarm self-employment, and farm self-employment. The definition of nonfarm self-employment and farm self-employment is not based on the net difference between gross receipts or sales and operating expenses, depreciation, etc. The monthly amounts for these income types are based on the salary or other income received from the business by the owner of the business or farm during the 4-month reference period. Top-coding Three different sources of monthly employment income are identified in the SIPP public use files: (1) wage and salary income, (2) self-employed earnings, and (3) other worker arrangements. Each of these three sources is topcoded separately. For each source, monthly amounts over $12,500 (one-twelfth of the $150,000 annual benchmark) are topcoded if the total income from those sources from all 4 months in the wave is greater than $50,000 (one-third of $150,000). See SIPP User's Guide, Ch. 10 for information on the 1996 panel and XX for information on the 1993 panel. While the income amounts from most sources are recorded monthly for the 4-month reference period, property income amounts, interest, dividends, rental income, etc., were recorded as totals for the 4-month period. These totals were distributed equally between months of the reference period for purposes of calculating monthly averages. Topcoding of Income Variables To protect against the possibility that a user might recognize the identity of a SIPP respondent with very high income, income from every source is "topcoded" so that no individual income amounts above $150,000 are revealed. While the data dictionary indicates a topcode of 50,000 for monthly income, this topcode will rarely be used. In most cases the monthly income is shown as an individual dollar amount of $12,500, with $12,500 actually representing "$12,500 or more." (the $150,000 annual income topcode is $12,500 multiplied by 12 months). Individual monthly amounts above $12,500 may occasionally be shown if the respondent's income varied considerably from month to month, as long as the average does not exceed $12,500. For example, if a respondents' income from a single job were concentrated in only one of the four reference months, a figure as high as $50,000 could be shown. (Income from interest or property have lower topcodes). Summary income figures on the person, family, and household records are simple sums of the components shown on the file after topcoding, and are not independently topcoded. Thus, a person with high income from several sources (jobs, businesses, property) could have aggregate monthly income well over the topcode for each source. Families and households with a number of high income members could theoretically have aggregate income shown well over $150,000, though well below the $1.5 million shown as the highest allowable value in the data dictionary. The user is cautioned against trying to make much use of the occasional monthly figures above $12,500, except in calculating aggregates or observing patterns across the 4-month period for a single individual, family, or household. Those units with higher monthly amounts shown are a biased sample of high income units, more likely to include units with income from multiple sources than other units with equally high aggregate income which comes from a single source. See SIPP User's Guide, Ch. 10 for information on the 1996 panel and the Survey of Income and Program Participation (SIPP) 1993 Panel, Longitudinal File Codebook (p. 2-1) for information on the 1993 panel. */ cd $tempdata *********************************** notes: All variables are monthly except where noted. notes: All variables come from the longitudinal panel except where noted. *********************************** /* Poverty thresholds variables, including tfpov, thpov, tsfpov are turned to monthly variables when they were not in panel 96,90,91,92,93. To obtain yearly statistics, you need to multiply these variables by 12*/ /*tfpov's minimum value is 6024 for 90 panel while it is 0 for other panels. Reason:0s were all dropped after dropping _merge!=3 in 90 panel.*/ if $p==96|$p==90|$p==91|$p==92|$p==93 { replace tfpov=tfpov/12 replace thpov=thpov/12 replace tsfpov=tsfpov/12 } if "$p" == "04" | "$p" == "08" { cd $tempdata use test, clear } *Note: total income variables include negative values--the loss from investment, property, etc. aorder sort $ids label data "Set F: Income, $p SIPP ($ver)" cd $output saveold set_f, replace capture log close