This code goes over how
to create an open-ended do loop in Stata 11, but it should work in most
versions of Stata. The dataset for
this sample code is hourly observations of auctions. The data set includes information on offer
price, seller name and quantity of the item offered.
The
same auction will show up each hour until it is sold or the auction expires at
48 hours. I am attempting to identify the length of a given group (batch) of
auctions, i.e. four seperate auctions for 20 bags of sugar at $20 offered by
joe.
The
problem: the same person may post a new auction with the same offer price and
quantity as the original batch of auctions expire. This means I must identify
those batches that show up for more than 48 hours. The sample code below
does this. There is always room for improvement and this code could be made
more efficient, but it worked for me and hopefully it will help you.
NOTE: Commands are
delineated using ;
Comments begin with * and end with ;
*this
line changes two float variables into string variables: buy and q;
tostring
buyamount quantity, gen(buy q);
*this
line combines the new string variable plus one other into a new string;
gen
batch1 = seller+buy+q;
*the
next two lines drop the original string variables to keep things clean.;
drop
buy;
drop
q;
*
this sorts the data by time and batch, keeps only one observation for each
batch in each time period.;
bysort
t batch1: drop if _n>1;
*
here the ordering is changed to look at the time periods by batch;
sort
batch1 t;
*
this generates a variable n, listing t in hours n=1 means 01jan1960 01:00:00
n=2 means 01jan1960 02:00:00;
gen
n = hours(t);
*
this generates a variable, indicating the maximum value of n for each batch;
bysort
batch1: gen T = n[_N];
*
this generates a variable counting the difference b/t the current time
period and the maximum value of n. i.e. 12 hours, 11 hours, 10 hours, 9 hours,
......;
gen
z = T-n;
*this
just creates a duplicate variable to adjust.;
gen
batch2 = batch1;
*
this creates a new batch id for batch observations that are listed for more
then 48 hours. ie. batch joe202 at n=49 hours becomes ijoe202 @ n=1,;
*
while joe202 at n=47 hours stays joe202 at n=47 hours;
replace
batch2 = "i"+batch1 if z>48;
*cleaning
up by dropping necessary variables;
drop
T;
drop
z;
*generates
two new variables for use in the loop. i must initially be greater than the
value in the while statement, if it is lower, then the loop will be skipped.;
gen
i2 = 0;
gen
i = 100;
*this
sets the condition for continuing the loop;
while
i>=49;
*indicates
the beginning of the loop;
{;
*i2
is redefined in each iteration of the loop, to make things easier it is dropped
at the beginning of each loop and redefined later;
drop
i2;
*This
repeats the process above;
sort
batch2 t;
bysort
batch2: gen T = n[_N];
gen
z = T-n;
replace
batch2 = "i"+batch2 if z>48;
drop
T;
*at
this point this identifies if there are any batches that still have an n
greater than 48.;
*a
new variable needed to be defined rather than altering i, because the replace
did not work with the egen function;
egen
i2 = max(z);
drop
z;
*this
adjusts it so the count variable matches the condition set at the beginning of
the loop;
replace
i = i2;
*indicates
the end of the loop;
};
*cleaning
up the extra variables;
drop
i2;
Created
by Michael Morrison