Chapter 4. Data ﬁles
importing from plain text, the program oﬀers limited handling of character (string) data: if
agiven column contains character data only, consecutive numeric codes are substituted for
the strings, and once the import is complete a table is printed showing the correspondence
between the strings and the codes.
Dates (or observation labels): Optionally, the ﬁrst column may contain strings such as dates,
or labels for cross-sectional observations. Such strings have a maximum of 15 characters (as
with variable names, longer strings will be truncated). A column ofthis sort shouldbe headed
with the string obs or date, or the ﬁrst row entry may be left blank.
For dates to be recognized as such, the date strings should adhere to one or other of a set of
speciﬁc formats, as follows. For annual data: 4-digit years. For quarterly data: a 4-digit year,
followedby a separator (either a period, a colon, or the letter Q), followed by a 1-digit quarter.
Examples: 1997.1, 2002:3, 1947Q1. For monthly data: a 4-digit year, followed by a period or
acolon, followed by a two-digit month. Examples: 1997.01, 2002:10.
Plain text (“CSV”) ﬁles can use comma, space, tab or semicolon as the column separator. When you
open such a ﬁle via the GUI you are given the option of specifying the separator, though in most
cases it should be detected automatically.
If you use a spreadsheet to prepare your data you are able to carry out various transformations of
the “raw” data with ease (adding things up, taking percentages or whatever): note, however, that
you can also do this sort of thing easily—perhaps more easily—within gretl, by using the tools
under the “Add” menu.
Appending imported data
You may wish to establish a dataset piece by piece, by incremental importation of data from other
sources. Thisis supported via the “File, Appenddata” menu items: gretl will check the newdata for
conformability with the existing dataset and, if everything seems OK, will merge the data. You can
addnewvariables in this way, provided the data frequency matches that of the existingdataset. Or
you can append new observations for data series that are already present; in this case the variable
names must match up correctly. Note that by default (that is, if you choose “Open data” rather
than “Append data”), opening a newdata ﬁle closes the current one.
Using the built-in spreadsheet
Under the “File, New data set” menu you can choose the sort of dataset you want to establish (e.g.
quarterly time series, cross-sectional). You will then be prompted for starting and ending dates (or
observation numbers) and the name of the ﬁrst variable to add to the dataset. After supplying this
information you will be faced with a simple spreadsheet into which you can type data values. In
the spreadsheet window, clicking the right mouse button will invoke a popup menu which enables
you to add a new variable (column), to add an observation (append a row at the foot of the sheet),
or to insert an observation at the selected point (move the data down and insert a blank row.)
Once you have entereddata into the spreadsheet you import these into gretl’s workspace using the
spreadsheet’s “Apply changes” button.
Please note that gretl’s spreadsheet is quite basic and has no support for functions or formulas.
Data transformations are done via the “Add” or “Variable” menus in the main window.
Selecting from a database
Another alternative is to establish your dataset by selecting variables from a database.
Begin with the “File, Databases” menu item. This has four forks: “Gretl native”, “RATS 4”, “PcGive”
and “On database server”. You should be able to ﬁnd the ﬁle fedstl.bin in the ﬁle selector that
Chapter 4. Data ﬁles
opens if you choose the “Gretl native” option since this ﬁle, which contains a large collection ofUS
macroeconomic time series, is supplied with the distribution.
You won’t ﬁnd anything under “RATS 4” unless you have purchased RATS data.
If you do possess
RATS data you should go into the “Tools, Preferences, General” dialog, select the Databases tab,
and ﬁll in the correct path to your RATS ﬁles.
If your computer is connected to the internet you should ﬁnd several databases (at Wake Forest
University) under “On database server”. You can browse these remotely; you also have the option
of installing them onto your own computer. The initial remote databases window has an item
showing, for each ﬁle, whether it is already installed locally (and if so, if the local version is up to
date with the version at Wake Forest).
Assuming you havemanagedtoopen a database you canimportselectedseries into gretl’sworkspace
by using the “Series, Import” menu item in the database window, or via the popup menu that ap-
pearsifyou clickthe right mouse button, or by dragging theseriesintothe program’s main window.
Creating a gretl data ﬁle independently
It is possible to create a data ﬁle in one or other of gretl’s own formats using a text editor or
software tools such as awk, sed or perl. This may be a good choice if you have large amounts
of data already in machine readable form. You will, of course, need to study these data formats
(XML-based or “traditional”) as described in AppendixA.
4.4 Structuring a dataset
Once your data are read by gretl, it may be necessary to supply some information on the nature of
the data. We distinguish between three kinds of datasets:
1. Cross section
2. Time series
3. Panel data
The primary tool for doing this is the “Data, Dataset structure” menu entry in the graphical inter-
face, or the setobs command for scripts and the command-line interface.
Cross sectional data
By a cross section we mean observations on a set of “units” (which may be ﬁrms, countries, indi-
viduals, or whatever) at a common point in time. This is the default interpretation for a data ﬁle:
if there is insuﬃcient information to interpret data as time-series or panel data, they are automat-
ically interpreted as a cross section. In the unlikely event that cross-sectional data are wrongly
interpreted as time series, you can correct this by selecting the “Data, Dataset structure” menu
item. Click the “cross-sectional” radio button in the dialog box that appears, then click “Forward”.
Click “OK” to conﬁrm your selection.
Time series data
When you import data from a spreadsheet or plain text ﬁle, gretl will make fairly strenuous eﬀorts
to glean time-series information from the ﬁrst column of the data, if it looks at all plausible that
such information may be present. If time-series structure is present but not recognized, again you
can use the“Data,Dataset structure” menu item. Select “Time series” andclick “Forward”; select the
appropriate data frequency andclick “Forward” again; then select or enter the starting observation
Chapter 4. Data ﬁles
and click “Forward” once more. Finally, click “OK” to conﬁrm the time-series interpretation if it is
correct (or click “Back” to make adjustments if need be).
Besides the basic business of getting a data set interpreted as time series, further issues may arise
relating to the frequency of time-series data. In a gretl time-series data set, all the series must
have the same frequency. Suppose you wish to make a combineddataset using series that, in their
original state, are not all of the same frequency. For example, some series are monthly and some
Your ﬁrst step is to formulate a strategy: Do you want to end upwith a quarterly or a monthly data
set? A basic point to note here is that “compacting” data from a higher frequency (e.g. monthly) to
alower frequency (e.g. quarterly) is usually unproblematic. You lose information in doing so, but
in general it is perfectly legitimate to take (say) the average of three monthly observations to create
aquarterly observation. On the other hand, “expanding” data from a lower to a higher frequency is
not, in general, a valid operation.
In most cases, then, the best strategy is to start by creating a data set of the lower frequency, and
then to compact the higher frequency data to match. When you import higher-frequency data from
adatabase into the current data set, you are given a choice of compaction method (average, sum,
start of period, or end of period). In most instances “average” is likely to be appropriate.
You can also import lower-frequency data into a high-frequency data set, but this is generally not
recommended. What gretl does in this case is simply replicate the values of the lower-frequency
series as many times as required. For example, suppose we have a quarterly series with the value
35.5 in 1990:1, the ﬁrst quarter of 1990. On expansion to monthly, the value 35.5 will be assigned
to the observations for January, February and March of 1990. The expanded variable is therefore
useless for ﬁne-grained time-series analysis, outside of the special case where you know that the
variable in question does in fact remain constant over the sub-periods.
When the current data frequency is appropriate, gretl oﬀers both “Compact data” and “Expand
data” options under the “Data” menu. These options operate on the whole data set, compacting or
exanding all series. They should be considered“expert” options and should be used with caution.
Panel data are inherently three dimensional—the dimensions being variable, cross-sectional unit,
and time-period. For example, a particular number in a panel data set might be identiﬁed as the
observation on capital stock for General Motors in 1980. (A note on terminology: we use the
terms “cross-sectional unit”, “unit” and “group” interchangeably below to refer to the entities that
compose the cross-sectional dimension of the panel. These might, for instance, be ﬁrms, countries
For representation in a textual computer ﬁle (and also for gretl’s internal calculations) the three
dimensions must somehow be ﬂattened into two. This “ﬂattening” involves taking layers of the
data that would naturally stack in a third dimension, and stacking them in the vertical dimension.
gretl always expects data to be arranged“by observation”, that is, such that each rowrepresents an
observation (and each variable occupies one and only one column). In this context the ﬂattening of
apanel data set can be done in either oftwo ways:
Stacked time series: the successive vertical blocks each comprise a time series for a given
Stacked cross sections: the successive vertical blocks each comprise a cross-section for a
You may input data in whichever arrangement is more convenient. Internally, however, gretl always
stores panel data in the form of stacked time series.
Chapter 4. Data ﬁles
4.5 Panel data speciﬁcs
When you import panel data into gretl from a spreadsheet or comma separated format, the panel
nature of the data will not be recognized automatically (most likely the data will be treated as
“undated”). A panel interpretation can be imposed on the data using the graphical interface or via
the setobs command.
In the graphical interface, use the menu item “Data, Dataset structure”. In the ﬁrst dialog box
that appears, select “Panel”. In the next dialog you have a three-way choice. The ﬁrst two options,
“Stackedtimeseries” and“Stackedcross sections” are applicable ifthe data set is already organized
in one ofthese two ways. Ifyou select either ofthese options, the next stepisto specify the number
of cross-sectional units in the data set. The third option, “Use index variables”, is applicable if the
data set contains two variables that index the units and the time periods respectively; the next step
is then to select those variables. For example, a data ﬁle might contain a country code variable and
avariable representing the year of the observation. In that case gretl can reconstruct the panel
structure of the data regardless of how the observation rows are organized.
The setobs command has options that parallel those in the graphical interface. If suitable index
variables are available you can do, for example
setobs unitvar timevar --panel-vars
where unitvar is a variable that indexes the units and timevar is a variable indexing the periods.
Alternatively you can use the form setobs freq 1:1 structure, where freq is replaced by the “block
size” of the data (that is, the number of periods in the case of stacked time series, or the number
of units in the case of stacked cross-sections) and structure is either --stacked-time-series or
--stacked-cross-section. Two examples are given below: the ﬁrst is suitable for a panel in
the form of stacked time series with observations from 20 periods; the second for stacked cross
sections with 5 units.
setobs 20 1:1 --stacked-time-series
setobs 5 1:1 --stacked-cross-section
Panel data arranged by variable
Publicly available panel data sometimes come arranged “by variable.” Suppose we have data on two
variables, x1 and x2, for each of 50 states in each of 5 years (giving a total of 250 observations
per variable). One textual representation of such a data set would start with a block for x1, with
50 rows corresponding to the states and 5 columns corresponding to the years. This would be
followed, vertically, by a block with the same structure for variable x2. A fragment of such a data
ﬁle is shown below, with quinquennial observations 1965–1985. Imagine the table continued for
48 more states, followed by another 50 rows for variable x2.
Ifa dataﬁle with this sort of structure is read into gretl,
the program will interpret the columns as
distinct variables, so the data will not be usable “as is.” But there is a mechanism for correcting the
situation, namely the stack function within the genr command.
Consider the ﬁrst data column in the fragment above: the ﬁrst 50 rows of this column constitute a
cross-section for the variable x1 in the year 1965. If we could create a new variable by stacking the
Note thatyou will have tomodify sucha dataﬁleslightlybefore itcanbe readatall. Theline containing the variable
name (in thisexamplex1)willhaveto beremoved, and sowilltheinitialrowcontaining the years, otherwise they will be
taken asnumerical data.
Chapter 4. Data ﬁles
ﬁrst 50 entriesin the secondcolumn underneath the ﬁrst 50 entries in the ﬁrst, we would be on the
way to making a data set “by observation” (in the ﬁrst of the two forms mentioned above, stacked
cross-sections). That is, we’dhave a column comprising a cross-section for x1 in 1965, followed by
across-section for the same variable in 1970.
The following gretl script illustrates how we can accomplish the stacking, for both x1 and x2. We
assume that the original data ﬁle is called panel.txt, and that in this ﬁle the columns are headed
with “variable names” v1, v2, ..., v5. (The columns are not really variables, but in the ﬁrst instance
we “pretend” that they are.)
genr x1 = stack(v1..v5) --length=50
genr x2 = stack(v1..v5) --offset=50 --length=50
setobs 50 1:1 --stacked-cross-section
store panel.gdt x1 x2
The second line illustrates the syntax of the stack function. The double dots within the parenthe-
sesindicate a range ofvariablesto be stacked: here we want to stack all 5 columns (for all 5 years).4
The full data set contains 100 rows; in the stacking of variable x1 we wish to read only the ﬁrst 50
rows from each column: we achieve this by adding --length=50. Note that if you want to stack a
non-contiguous set ofcolumns you can give a comma-separated list ofvariable names, as in
genr x = stack(v1,v3,v5)
or you can provide within the parentheses the name of a previously created list (see chapter14).
On line 3 we do the stacking for variable x2. Again we want a length of 50 for the components of
the stacked series, but this time we want gretl to start reading from the 50th row of the original
data, and we specify --offset=50. Line 4 imposes a panel interpretation on the data; ﬁnally, we
save the data in gretl format, with the panel interpretation, discarding the original “variables” v1
The illustrative script above is appropriate when the number of variable to be processed is small.
When then are many variables in the data set it will be more eﬃcient to use a command loop to
accomplish the stacking, as shown in the following script. The setup is presumed to be the same
as in the previous section (50 units, 5 periods), but with 20 variables rather than 2.
genr k = ($i - 1) * 50
genr x$i = stack(v1..v5) --offset=k --length=50
setobs 50 1.01 --stacked-cross-section
store panel.gdt x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 \
x11 x12 x13 x14 x15 x16 x17 x18 x19 x20
Panel data marker strings
It can be helpful with panel data to have the observations identiﬁed by mnemonic markers. A
special function in the genr commandis available for this purpose.
In the example above, suppose all the states are identiﬁed by two-letter codes in the left-most
column of the original dataﬁle. When the stacking operation is performed, these codes will be
stacked along with the data values. If the ﬁrst row is marked AR for Arkansas, then the marker AR
will end up being shown on each row containing an observation for Arkansas. That’s all very well,
but these markers don’t tell us anything about the date of the observation. To rectify this we could
You can also specify a list of series using the wildcard ‘*’; for example stack(p*) would stack all series whose
namesbegin with ‘v’.
Chapter 4. Data ﬁles
genr year = 1960 + (5 * time)
genr markers = "%s:%d", marker, year
The ﬁrst line generatesa 1-basedindex representing the periodofeach observation,andthe second
line uses the time variable to generate a variable representing the year of the observation. The
third line containsthis special feature: if (and only if) the name of the new“variable” to generate is
markers, the portion of the command following the equals sign is taken as a C-style format string
(which must be wrapped in double quotes), followed by a comma-separated list of arguments.
The arguments will be printed according to the given format to create a new set of observation
markers. Valid arguments are either the names of variables in the dataset, or the string marker
which denotes the pre-existing observation marker. The format speciﬁers which are likely to be
useful in this context are %s for a string and %d for an integer. Strings can be truncated: for
example %.3s will use just the ﬁrst three characters of the string. To chop initial characters oﬀ
an existing observation marker when constructing a new one, you can use the syntax marker + n,
where n is a positive integer: in the case the ﬁrst n characters will be skipped.
After thecommands above are processed, then,the observation markerswill looklike,for example,
AR:1965, where the two-letter state code and the year of the observation are spliced together with
Panel dummy variables
In a panel study you may wish to construct dummy variables of one or both of the following sorts:
(a) dummies as unique identiﬁersfor the units or groups, and(b) dummies as unique identiﬁers for
the time periods. The former may be used to allow the intercept of the regression to diﬀer across
the units, the latter to allow the intercept to diﬀer across periods.
Two special functions are available to create such dummies. These are found under the “Add”
menu in the GUI, or under the genr command in script mode or gretlcli.
1. “unit dummies” (script command genr unitdum). This command creates a set of dummy
variables identifying the cross-sectional units. The variable du_1 will have value 1 in each
row corresponding to a unit 1 observation, 0 otherwise; du_2 will have value 1 in each row
corresponding to a unit 2 observation, 0 otherwise; and so on.
2. “time dummies” (script command genr timedum). This command creates a set of dummy
variables identifyingthe periods. The variable dt_1 will have value 1 in each rowcorrespond-
ing to a period 1 observation, 0 otherwise; dt_2 will have value 1 in each row corresponding
to a period 2 observation, 0 otherwise; and so on.
If a panel data set has the YEAR of the observation entered as one of the variables you can create a
periodic dummy to pick out a particular year, e.g. genr dum = (YEAR=1960). You can also create
periodic dummy variables using the modulus operator, %. For instance, to create a dummy with
value 1 for the ﬁrst observation andevery thirtieth observation thereafter, 0 otherwise, do
genr dum = ((index-1) % 30) = 0
Lags, diﬀerences, trends
If the time periods are evenly spaced you may want to use lagged values of variables in a panel
regression (but see also chapter19); you may also wish to construct ﬁrst diﬀerences ofvariables of
Once a dataset is identiﬁed as a panel, gretl will handle the generation of such variables correctly.
For example the command genr x1_1 = x1(-1) will create a variable that contains the ﬁrst lag
Chapter 4. Data ﬁles
of x1 where available, and the missing value code where the lag is not available (e.g. at the start of
the time series for each group). When you run a regression using such variables, the program will
automatically skip the missing observations.
When a panel data set has a fairly substantial time dimension, you may wish to include a trend in
the analysis. The command genr time creates a variable named time which runs from 1 to T for
each unit, where T is the length of the time-series dimension of the panel. If you want to create an
index that runs consecutively from 1 to m T, where m is the number of units in the panel, use
Basic statistics by unit
gretl contains functions which can be used to generate basic descriptive statistics for a given vari-
able, on a per-unit basis; these are pnobs() (number ofvalid cases), pmin() and pmax() (minimum
and maximum) and pmean() and psd() (mean and standard deviation).
As a brief illustration, suppose we have a panel data set comprising 8 time-series observations on
each of N units or groups. Then the command
genr pmx = pmean(x)
creates a series of this form: the ﬁrst 8 values (corresponding to unit 1) contain the mean of x for
unit 1, the next 8 values contain the mean for unit 2, and so on. The psd() function works in a
similar manner. The sample standard deviation for group i is computed as
denotes the number of valid observations on x for the given unit, ¯x
denotes the group
mean, and the summation is across valid observations for the group. If T
< 2, however, the
standard deviation is recordedas 0.
One particular use ofpsd() may be worth noting. If you want to form a sub-sample of a panel that
contains only those units for which the variable x is time-varying, you can either use
smpl (pmin(x) < pmax(x)) --restrict
smpl (psd(x) > 0) --restrict
4.6 Missing data values
Representation and handling
Missing values are representedinternally as DBL_MAX,the largest ﬂoating-point number that can be
represented on the system (which is likely to be at least 10 to the power 300, and so should not
be confused with legitimate data values). In a native-format data ﬁle they should be represented
as NA. When importing CSV data gretl accepts several common representations of missing values
including 999, thestringNA(in upper or lower case), a single dot, or simply a blankcell. Blank cells
should, of course, be properly delimited, e.g. 120.6,,5.38, in which the middle value is presumed
As for handling of missing values in the course of statistical analysis, gretl does the following:
In calculating descriptive statistics (mean, standard deviation, etc.) under the summary com-
mand, missing values are simply skipped and the sample size adjusted appropriately.
Chapter 4. Data ﬁles
In running regressions gretl ﬁrst adjusts the beginning and end of the sample range, trun-
cating the sample if need be. Missing values at the beginning of the sample are common in
time series work due to the inclusion oflags, ﬁrst diﬀerences and so on; missing valuesat the
end of the range are not uncommon due to diﬀerential updating of series and possibly the
inclusion of leads.
If gretl detects any missing values “inside” the (possibly truncated) sample range for a regression,
the result depends on the character of the dataset and the estimator chosen. In many cases, the
program will automatically skip the missing observations when calculating the regression results.
In this situation a message is printed stating how many observations were dropped. On the other
hand, the skipping of missing observations is not supported for all procedures: exceptions include
all autoregressive estimators, system estimators such as SUR, and nonlinear least squares. In the
case of panel data, the skipping of missing observations is supported only if their omission leaves
abalanced panel. If missing observations are found in cases where they are not supported, gretl
gives an error message and refuses to produce estimates.
Manipulating missing values
Some special functions are available for the handling of missing values. The boolean function
missing() takes the name of a variable as its single argument; it returns a series with value 1 for
each observation at which the given variable has a missing value, and value 0 otherwise (that is, if
the given variable has a valid value at that observation). The function ok() is complementary to
missing; it is just a shorthand for !missing (where ! is the boolean NOT operator). For example,
one can count the missing values for variable x using
scalar nmiss_x = sum(missing(x))
The function zeromiss(), which again takes a single series as its argument, returns a series where
all zero values are set to the missing code. This should be used with caution—one does not want
to confuse missing values and zeros—but it can be useful in some contexts. For example, one can
determine the ﬁrst valid observation for a variable x using
scalar x0 = min(zeromiss(time * ok(x)))
The function misszero() does the opposite of zeromiss, that is, it converts all missing values to
It may be worth commenting on the propagation of missing values within genr formulae. The
general rule is that in arithmetical operations involving two variables, if either of the variables has
amissing value at observation t then the resulting series will also have a missing value at t. The
one exception to this rule is multiplication by zero: zero times a missing value produceszero (since
this is mathematically valid regardless of the unknown value).
4.7 Maximum size of data sets
Basically, the size of data sets (both the number of variables and the number of observations per
variable) is limited only by the characteristics of your computer. gretl allocates memory dynami-
cally, andwill ask the operating system for as much memory as your data require. Obviously, then,
you are ultimately limited by the size of RAM.
Aside from the multiple-precision OLS option, gretl uses double-precision ﬂoating-point numbers
throughout. The size of such numbers in bytes depends on the computer platform, but is typically
eight. To give a rough notion of magnitudes, suppose we have a data set with 10,000 observations
on 500 variables. That’s 5 million ﬂoating-point numbers or 40 million bytes. If we deﬁne the
Chapter 4. Data ﬁles
megabyte (MB) as 10241024 bytes, as is standard in talking about RAM, it’s slightly over 38 MB.
The program needs additional memory for workspace, but even so, handling a data set of this size
should be quite feasible on a current PC, which at the time of writing is likely to have at least 256
MB of RAM.
If RAM is not an issue, there is one further limitation on data size (though it’s very unlikely to
be a binding constraint). That is, variables and observations are indexed by signed integers, and
on a typical PC these will be 32-bit values, capable of representing a maximum positive value of
231 1 2;147;483;647.
The limits mentioned above apply to gretl’s “native” functionality. There are tighter limits with
regard to two third-party programs that are available as add-ons to gretl for certain sorts of time-
series analysis including seasonal adjustment, namely TRAMO/SEATS and X-12-ARIMA. These pro-
grams employ a ﬁxed-size memory allocation, and can’t handle series of more than 600 observa-
4.8 Data ﬁle collections
If you’re using gretl in a teaching context you may be interested in adding a collection of data ﬁles
and/or scripts that relate speciﬁcally to your course, in such a way that students can browse and
access them easily.
There are three ways to access such collections ofﬁles:
For data ﬁles: select the menu item “File, Open data, Sample ﬁle”, or click on the folder icon
on the gretl toolbar.
For script ﬁles: select the menu item “File, Script ﬁles, Practice ﬁle”.
When a user selects one of the items:
The data or script ﬁlesincludedin the gretl distribution are automatically shown (thisincludes
ﬁles relating to Ramanathan’s Introductory Econometrics and Greene’s Econometric Analysis).
The program looks for certain known collections of data ﬁles available as optional extras,
for instance the dataﬁles from various econometrics textbooks (Davidson and MacKinnon,
Gujarati, Stock and Watson, Verbeek, Wooldridge) and the Penn World Table (PWT 5.6). (See
the data pageatthegretlwebsiteforinformationonthesecollections.)Iftheadditionalﬁles
are found, they are added to the selection windows.
The program then searches for valid ﬁle collections (not necessarily known in advance) in
these places: the “system” data directory, the system script directory, the user directory,
and all ﬁrst-level subdirectories of these. For reference, typical values for these directories
are shown in Table4.1. (Note that PERSONAL is a placeholder that is expanded by Windows,
corresponding to “My Documents” on English-language systems.)
system data dir
system script dir
Table 4.1: Typical locations for ﬁle collections
Any valid collections will be added to the selection windows. So what constitutes a valid ﬁle collec-
tion? This comprises either a set of data ﬁles in gretl XML format (with the .gdt suﬃx) or a set of
Chapter 4. Data ﬁles
script ﬁles containing gretl commands (with .inp suﬃx), in each case accompanied by a “master
ﬁle” or catalog. The gretl distribution contains several example catalog ﬁles, for instance the ﬁle
descriptions in the misc sub-directory of the gretl data directory and ps_descriptions in the
misc sub-directory of the scripts directory.
If you are adding your own collection, data catalogs should be named descriptions and script
catalogs should be be named ps_descriptions. In each case the catalog should be placed (along
with the associated data or script ﬁles) in its own speciﬁc sub-directory (e.g./usr/share/gretl/
The catalog ﬁles are plain text; if they contain non-ASCII characters they must be encoded as UTF-
8. The syntax of such ﬁles is straightforward. Here, for example, are the ﬁrst few lines of gretl’s
“misc” data catalog:
# Gretl: various illustrative datafiles
"arma","artificial data for ARMA script example"
"ects_nls","Nonlinear least squares example"
"hamilton","Prices and exchange rate, U.S. and Italy"
The ﬁrst line, which must start with a hash mark, contains a short name, here “Gretl”, which
will appear as the label for this collection’s tab in the data browser window, followed by a colon,
followedby an optional short description of the collection.
Subsequent lines contain two elements, separated by a comma and wrapped in double quotation
marks. The ﬁrst is a dataﬁle name (leave oﬀ the .gdt suﬃx here) and the second is a short de-
scription of the content of that dataﬁle. There should be one such line for each dataﬁle in the
Ascript catalog ﬁle looks very similar, except that there are three ﬁelds in the ﬁle lines: a ﬁlename
(without its .inp suﬃx), a brief description of the econometric point illustrated in the script, and
abrief indication of the nature of the data used. Again, here are the ﬁrst few lines of the supplied
“misc” script catalog:
# Gretl: various sample scripts
"arma","ARMA modeling","artificial data"
"ects_nls","Nonlinear least squares (Davidson)","artificial data"
"leverage","Influential observations","artificial data"
If you want to make your own data collection available to users, these are the steps:
1. Assemble the data, in whatever format is convenient.
2. Convert the data to gretl format andsave asgdt ﬁles. It isprobably easiest to convert the data
by importing them into the program from plain text, CSV, or a spreadsheet format (MS Excel
or Gnumeric) then saving them. You may wish to add descriptions of the individual variables
(the “Variable, Edit attributes” menu item), andadd information on the source of the data (the
“Data, Edit info” menu item).
3. Write a descriptions ﬁle for the collection using a text editor.
4. Put the dataﬁles plus the descriptionsﬁle in a subdirectory of the gretl data directory (or user
5. Ifthe collection isto be distributedto other people, package the data ﬁlesand catalog in some
suitable manner, e.g. as a zipﬁle.
If you assemble such a collection, and the data are not proprietary, we would encourage you to
submit the collection for packaging as a gretl optional extra.
Documents you may be interested
Documents you may be interested