49
Here are some archives that our berkeley.edu domain makes available to us. These
sites would be extremely expensive otherwise and are often useful. My web site, http:
//socrates.berkeley.edu/~maashforthemoment,contains linkstomany othertextand
data archives.
http://www.jstor.org
Complete, full-image collections of many
journals, economicsandotherwise.
http://www.lexis-nexis.com/
universe
Premiernews andlegal archive
http://www.eb.com:180/
EncyclopaediaBritannica
http://webspirs.
silverplatter.com/cgi-bin/
customers/ucb/ucb2b.cgi
EconLit
Another useful webtrickisthat youcanftp(le transferprotocol, usedhereas averb)
datasets withinshell-scripts,makeles(seeSection7), or SAS/Statascripts. For example,
todothisinSAS, includethe line:
x "ncftp -f emlab.berkeley.edu:/pub/data/89raw.txt.Z;" ;
toftpthenamedletothedirectoryfromwhichyoulaunchedSAS.Bytheway,Irecommend
ncftp,whichisasouped-upversionofftp, bothinteractivelyandinscripts.
7 make
Theprogram makeisa versatileprojectmanagementtool. Makeallowstheusertospecify
aseries of \dependencies" among les so that you canbe sure that your data and output
areuptodate.
5
Supposethatinthecourseofyourresearch, youftpseveral years ofCurrentPopulation
Survey data from emlab.berkeley.edu to /scratch/public. You then read the data into
several SASdatasets. Youthen convert the SASdatasets intoStata datasets (see Section
9), mergeitwithsomedatathatarealready inStataformat, andperform some statistical
proceduresinStata. Youdon'twanttowastetime(nottomentionleaccessandbandwidth)
byrepeatingtheparticularlytime-consumingpartsoftheprocess, e.g.,theletransfersand
inputsof rawdata.
CreatealecalledMakeleinyourprojectdirectory. Makeleshouldcontaindependency
linesand command lines. Thedependency lines statethat thele
6
to the left of thecolon
depends on the lesto the right ofthe colon. Ifthe right-handles are more recent than
the left hand les, thenthe set of commands inthe block of command linesfollowingthe
dependencylinewillbeexecuted. Note: eachofthecommandlinesmustbeginwithaTAB
character.
# If the Stata output has changed, extract and label the regression results
5
makewasoriginallydesignedforsoftwaredevelopment,butIthinkthatitmeetstheneedsofempirical
economistsextremelywell.
6
Inmorecomplicateduses,thisisnotlimitedtobeingale.
23
38
# with the homegrown perl scripts, run LaTeX on the extracted
# results, and convert the .dvi file to a PostScript file.
final.ps : final.output labelvars.pl
stat2lat.pl < final.output | labelvars.pl > final.tex
latex2e final.tex
dvips -p/tmp/final.ps final.dvi
# If either the
# merged dataset or the do-file for the statistical procedures
# has changed, run the statistical procedure in Stata.
final.output : final.do final.dta
stata -b do final.do
# If any of the input data for the merged data file has changed,
# rerun the merging do-file.
final.dta : cps79.dta cps89.dta otherdat.dta
stata -b do mergeall.do
# If the do-file that creates the other data has changed,
# recreate the other data.
otherdat.dta : otherdat.do
stata -b do otherdat.do
# If the SAS program that reads the raw 1979 CPS data has changed, rerun
# this program and convert the dataset to Stata format.
cps79.dta : cps79.sas
sas cps79.sas
sas2stata -f -r cps79.ssd01
# If the SAS program that reads the raw 1989 CPS data has changed, rerun
# this program and convert the dataset to Stata format.
cps89.dta : cps89.sas
sas cps89.sas
sas2stata -f -r cps89.ssd01
# If the 1979 raw data has disappeared, e.g., because scratch has been
# cleaned, ftp it from emlab again.
cps79.sas : 79raw.txt
ncftp -f emlab.berkeley.edu:/pub/data/79raw.txt
touch 79raw.txt
# If the 1989 raw data has disappeared, e.g., because scratch has been
# cleaned, ftp it from emlab again.
cps89.sas : 89raw.txt
24
VB.NET PDF - Convert PDF with VB.NET WPF PDF Viewer Create PDF from Text. PDF Export. Convert PDF to Word to PDF. Image: Remove Image from PDF Page. Image Edit Bookmark. Metadata: Edit, Delete Metadata. Form Process
extract data from pdf file; online form pdf output
33
ncftp -f emlab.berkeley.edu:/pub/data/89raw.txt
touch 89raw.txt
Nowifyoutype make final.ps,makewill readMakeleand perform only and all the
tasksrequiredtoproduceanup-to-dateversionofnal.ps,goingallthewaybacktorawdata
ifnecessary. MakeiscapableofmuchmorethanI'vedescribed. Thereisanexcellentbookon
makecalledManagingProjectswithMake(OramandTalbott1991)publishedbybyO'Reilly
and Associates
7
which costs about $20, and a very good document called \GNU Make"
(StallmanandMcGrath1988)availablewiththemostcurrentdistributionof makefromthe
FreeSoftware Foundation athttp://www.gnu.ai.mit.edu/software/make/make.html.
8 A maillter
Ifyousubscribe toactive mailinglists, e.g., Statalist, youmaywant tosort your e-mail to
dierentinboxes.
1. Create a maillterle, e.g., .mailfilter,modeledonTable 10. The ltering based
onchoosingatextstringthatwillappearonlyinmailintendedforaparticularinbox,
e.g.,\statalist". Makethis.mailfilterleexecutablewithchmod +x .mailfilter
2. Edityour.pinercletoidentifyallofyourinboxes. Table11containstheportionof
.pinercthatmustbe modied.
3. Createa.forwardlethatpipesallofyourmailtoyourmaillter. Yourle.forward
leshouldcontain: |/accounts/grad/userid/.mailfilter
9 Database Conversion
1. The unix utility sas2stataonthe EMLwill convertunix SASdatasets (*.ssd01) to
all-platformStata(*.dta)datasets,keepingvariablenames, lengths, andlabelsintact.
Thus,
% sas2stata mydata.ssd01
willcreate mydata.dta;Forbriefordetaileddocumentation,youcantry respectively:
% sas2stata
% man sas2stata
7
O'ReillyandAssociateshavemanyexcellentbooksonsomeofthetoolsdiscussedinthisdocument. In
particular,keepaneyeoutfortheir...inaNutshell series
25
58
Table10: .maillterltersmailinto4incomingfolders
#!/bin/sh
PATH=/bin:/usr/bin:/usr/ucb
exportPATH
user=mash
if [ "`whoami`"!= "$user"]; then
exit 1
fi
mailbox=/var/spool/mail/$user
home=/srv/accounts/grad/$user
penmail=$home/mail/penmail
stata=$home/mail/stata
datalist=$home/mail/datalist
tmp=$home/.tmp
cat - > $tmp
if grep -s -i "lbo-talk"$tmp
then
sed -e '2,$ s/^From />From/' $tmp >> $penmail
echo >> $penmail
elif grep -s -i "statalist"$tmp
then
sed -e '2,$ s/^From />From/' $tmp >> $stata
echo >> $stata
elif grep -s -i "sas-l"$tmp
then
sed -e '2,$ s/^From />From/' $tmp >> $datalist
echo >> $datalist
elif grep -s -i "saspac"$tmp
then
sed -e '2,$ s/^From />From/' $tmp >> $datalist
echo >> $datalist
elif grep -s -i "labor-data" $tmp
then
sed -e '2,$ s/^From />From/' $tmp >> $datalist
echo >> $datalist
else
sed -e '2,$ s/^From />From/' $tmp >> $mailbox
echo >> $mailbox
fi
rm -f $tmp
exit 0
Table 11: Nameincomingfoldersin.pinerc
# incoming-folders are those other than INBOX that receive new messages.
# Folder syntax: optnl-label {optnl-imap-hostname}folder-path
# Use only if you filter incoming email into multiple files or receive
# email on several different machines.
# Example:
# incoming-folders=Consulting {carson.u.washington.edu}filter/to-help,
#
Widget-Project{carson.u.washington.edu}filter/to-widget,
#
Old-Student-Acct {imap.berkeley.edu}inbox
# Michael Ash's incoming folders:
incoming-folders=penmail /srv/accounts/grad/mash/mail/penmail,
stata
/srv/accounts/grad/mash/mail/stata,
datalist /srv/accounts/grad/mash/mail/datalist
26
18
Notethatsas2statarunsbothSASandStataaswellassomeoftheclassicunixutilities
likeawkorsedintheprocessofwritingthedataset. SoyouarelimitedtoUNIXsystems
that haveBOTHapplications, e.g.,the EMLbut not socrates.berkeley.edu(SAS
but notStata). Ithink thatitsas2stata isavailablefreefromtheRANDCorporation
ifyouwanttoinstallit onyourownUNIXsystem.
2. OntheEML(orfor PCsifyoubuy it), youcanalsouse
% dbmscopy
aninteractiveutility that will do lots of cross-program dataset conversions, e.g., PC
SAS $ unix SAS $ Stata $ Excel $ Lotus 1{2{3, etc. After you run dbmscopy
interactively several times, you can learn the syntax to use dbmsnox, a conversion
programcommand-linethat runs from thecommandline. Forexample,
% dbmsnox /tmp/mydata.dbf /tmp/mydata.stata4
will convertdBaselemydata.dbfintoStatalemydata.dta.
3. ForPCs, youcan buyStat/Transfer, autilitysold(butnotwritten)by StataCorpo-
ration(http://www.stata.com). The academicpriceislow (c. $50), and itdoes lots
ofcross-programdatasetconversions. IthinkthisiswellworthitifyoubuytheStata
packagefor thePC.
27
57
A
psidcode.pl
#!/usr/local/bin/perl
# psidcode.pl
# michaelash
# march 1997
# Bug reports to mash@econ.berkeley.edu
# Main use: Parsesthe data-centerfile created duringthe creation
# of a PSID data set at \url{www.umich.edu/~psid}for
# year, level (individual or family),and variable name (V#####).
# Writes codebook for those variables.
# NonEML users shouldmake sure that the documentationdirectory
# is properly specified.
# Reads any input file with rows containing
# Year Level Variable
# in that order, e.g.,
# 1984 Family V10263
# Reads "data-center"in currentdirectory
# Reads zipped PSID documentationin /archive/psid_all/documentation
# Writes "codebook"in current directory
# Glitches:
# includespage breaks and page headers from thePSID codebooks.
# Selecting the last variablein a year may cause too much output.
open(VLIST,"<$ARGV[0]");
open(CODEBOOK,">codebook");
# Read includedvariables and individualor family.
# If family,read year too.
while ($line= <VLIST>){
chop($line);
$line=~ s/^\s+//;
($year,$level,$vname,@junk) = split(/\s+/,$line);
$level=~ s/^\s+// ;
$level=~ s/\s+$// ;
$vname=~ s/^\s+// ;
$vname=~ s/\s+$// ;
$vname=~ s/^V//;
if ($level eq "Individual") {
$flist{ind}= "unzip -c /archive/psid_all/documentation/68-92doc.zip|" ;
$vlist{ind}.= "$vname:";
}
elsif($level eq "Family") {
if ($year >= 1968 & $year <= 1978) {
$flist{fam6878}= "unzip -c /archive/psid_all/documentation/68-78doc.zip |" ;
$vlist{fam6878}.= "$vname:"
}
if ($year == 1979) {
$flist{fam79} = "unzip-c /archive/psid_all/documentation/79doctxt.zip|" ;
$vlist{fam79} .= "$vname:"
}
if ($year == 1980) {
$flist{fam80} = "unzip-c /archive/psid_all/documentation/80doctxt.zip|" ;
$vlist{fam80} .= "$vname:"
}
if ($year == 1981) {
$flist{fam81} = "unzip-c /archive/psid_all/documentation/81doctxt.zip|" ;
$vlist{fam81} .= "$vname:"
28
63
}
if ($year == 1982) {
$flist{fam82} = "unzip-c /archive/psid_all/documentation/82doctxt.zip|" ;
$vlist{fam82} .= "$vname:"
}
if ($year == 1983) {
$flist{fam83} = "unzip-c /archive/psid_all/documentation/83doctxt.zip|" ;
$vlist{fam83} .= "$vname:"
}
if ($year == 1984) {
$flist{fam84} = "unzip-c /archive/psid_all/documentation/84doctxt.zip|" ;
$vlist{fam84} .= "$vname:"
}
if ($year == 1985) {
$flist{fam85} = "unzip-c /archive/psid_all/documentation/85doctxt.zip|" ;
$vlist{fam85} .= "$vname:"
}
if ($year == 1986) {
$flist{fam86} = "unzip-c /archive/psid_all/documentation/86doctxt.zip|" ;
$vlist{fam86} .= "$vname:"
}
if ($year == 1987) {
$flist{fam87} = "unzip-c /archive/psid_all/documentation/87doctxt.zip|" ;
$vlist{fam87} .= "$vname:"
}
if ($year == 1988) {
$flist{fam88} = "unzip-c /archive/psid_all/documentation/88doctxt.zip|" ;
$vlist{fam88} .= "$vname:"
}
if ($year == 1989) {
$flist{fam89} = "unzip-c /archive/psid_all/documentation/89doctxt.zip|" ;
$vlist{fam89} .= "$vname:"
}
if ($year == 1990) {
$flist{fam90} = "unzip-c /archive/psid_all/documentation/90doctxt.zip|" ;
$vlist{fam90} .= "$vname:"
}
if ($year == 1991) {
$flist{fam91} = "unzip-c /archive/psid_all/documentation/91doctxt.zip|" ;
$vlist{fam91} .= "$vname:"
}
if ($year == 1992) {
$flist{fam92} = "unzip-c /archive/psid_all/documentation/92doctxt.zip|" ;
$vlist{fam92} .= "$vname:"
}
}
}
close(VLIST);
# File loop: individualcross-yearfile and eachfamily year file
foreach$j (sort(keys(%flist))) {
$curfile= $flist{$j} } ;
@vars= split(/:/,$vlist{$j});
@varno= sort {$a <=> $b} @vars ;
print"\n\nRead $j: @vars\n";
print"Sorted $j: @varno \n\n";
# Variableloop within file
$i = 0 0 ;
$curvar = @varno[$i];
open(CURFILE,"$curfile") ;
if ($j eq fam6878) {
while ($line= <CURFILE>){
CLABEL: {
29
65
while ($curvar<1100){
$curvar= @varno[$i];
if($line =~ /^\s*$curvar/) {
print "Found$curvar.\n";
$i++ ;
print CODEBOOK $line ;
$k=0;
until ((($line=<CURFILE>)=~ /^\s{0,4}[0-9]/) || ($k==1000)){
print CODEBOOK$line ;
$k++ ;
}
goto CLABEL ;
}
$line= <CURFILE>;
}
}
BLABEL: {
if ($i i <= $#varno) {
$curvar = @varno[$i];
if($line =~ /^\s*\($curvar\)/){
print "Found$curvar.\n";
$i++ ;
print CODEBOOK $prevline;
$k=0 ;
until (( ($prevline=$line)&& (($line=<CURFILE>) =~ /^\s{0,4}[0-9]/)|| ($k==1000))){
print CODEBOOK$prevline ;
$k++ ;
}
goto BLABEL ;
}
else {
$prevline = $line ;
$line=<CURFILE>;
goto BLABEL ;
}
}
}
}
}
if ($j ne fam6878) {
while ($line= <CURFILE>)
{
ALABEL: {
if ($i<= $#varno){
$test= $line e ;
$test=~s/V/ / ;
# Check if codebook shouldincludevariableby comparing it tonext in
# the list of variables
if(($test =~/^\s+$curvar/) &&(($test=~/TLOC=/)||
($test=~/Name=/))){
print "Found$curvar.\n";
$i++ ;
$curvar= @varno[$i] ] ;
# Read and write all codebooklines until reach the next variable
print CODEBOOK $line ;
$k=0 ;
until (((($line= <CURFILE>)=~/TLOC=/)|| | ($line=~/Name=/)) || ($k==1000)){
print CODEBOOK$line ;
$k++ ;
}
goto ALABEL ;
}
}
}
30
Documents you may be interested
Documents you may be interested