48
Chapter 35. Extending SQL
-- cross-type comparisons int2 vs int8
OPERATOR 1 < (int2, int8) ,
OPERATOR 2 <= (int2, int8) ,
OPERATOR 3 = (int2, int8) ,
OPERATOR 4 >= (int2, int8) ,
OPERATOR 5 > (int2, int8) ,
FUNCTION 1 btint28cmp(int2, int8) ,
-- cross-type comparisons int2 vs int4
OPERATOR 1 < (int2, int4) ,
OPERATOR 2 <= (int2, int4) ,
OPERATOR 3 = (int2, int4) ,
OPERATOR 4 >= (int2, int4) ,
OPERATOR 5 > (int2, int4) ,
FUNCTION 1 btint24cmp(int2, int4) ;
Notice that this definition “overloads” theoperator strategy and support function numbers: each num-
ber occurs multiple times within the family. This is allowed so long as each instance of a particular
number has distinct input data types. The instances that have both input types equal to an operator
class’s input type are the primary operators and support functions for that operator class, and in most
cases shouldbe declared as part of the operator class rather than as loose members of the family.
In a B-tree operator family, all the operators in the family must sort compatibly, meaning that the
transitive laws hold across all the data types supported by the family: “if A = B and B = C, then A =
C”, and“if A < BandB < C, then A< C”. Moreover, implicitor binary coercioncasts betweentypes
represented in the operator family must not change the associated sort ordering. For each operator in
the family there must be a support function having the same two input data types as the operator. It
is recommended that a family be complete, i.e., for each combination of data types, all operators are
included. Each operator class should include just the non-cross-type operators and support function
for its data type.
To build amultiple-data-typehashoperator family, compatible hashsupport functions mustbe created
for each data typesupportedbythefamily. Here compatibilitymeans that the functions are guaranteed
to return the same hash code for any two values that are considered equal by the family’s equality
operators, even whenthe values are of different types. This is usually difficultto accomplishwhen the
types have different physical representations, but it can be done in some cases. Furthermore, casting
avalue from one data type represented in the operator family to another data type also represented
in the operator family via an implicit or binary coercion cast must not change the computed hash
value. Notice that there is only one support function per data type, not one per equality operator. It
is recommended that a family be complete, i.e., provide an equality operator for each combination
of data types. Each operator class should include just the non-cross-type equality operator and the
support function for its data type.
GiST, SP-GiST, and GIN indexes do not have any explicit notion of cross-data-type operations. The
set of operators supported is just whatever the primary support functions for a given operator class
can handle.
Note: Prior to PostgreSQL 8.3, there was no concept of operator families, andso any cross-data-
typeoperators intendedtobeused with an index had tobebound directly intothe index’s operator
class. While this approachstill works, it is deprecated because it makes an index’s dependencies
too broad, and because the planner can handle cross-data-type comparisons more effectively
when both data types have operators in the same operator family.
973
72
Chapter 35. Extending SQL
35.14.6. System Dependencies on Operator Classes
PostgreSQL uses operator classes to infer the properties of operators in more ways than just whether
they can be used with indexes. Therefore, you might want to create operator classes even if you have
no intention of indexing any columns of your data type.
In particular, there are SQL features such as
ORDER BY
and
DISTINCT
that require comparison and
sorting of values. To implement these features on a user-defined data type, PostgreSQL looks for the
default B-tree operator class for the data type. The “equals” member of this operator class defines the
system’s notionof equalityof values for
GROUP BY
and
DISTINCT
,andthe sort ordering imposed by
the operator class defines the default
ORDER BY
ordering.
Comparison of arrays of user-defined types also relies on the semantics defined by the default B-tree
operator class.
If there is no default B-tree operator class for a data type, the system will look for a default hash
operator class. But since that kind of operator class only provides equality, in practice it is only
enough tosupport array equality.
When there is no default operator class for a data type, you will get errors like “could not identify an
ordering operator” if you try to use these SQL features with the data type.
Note: In PostgreSQL versions before 7.4, sorting and grouping operations would implicitly use
operators named
=
,
<
,and
>
.The new behavior of relying on default operator classes avoids
having to make any assumption about the behavior of operators with particular names.
Another important point is that an operator that appears in a hash operator family is a candidate for
hash joins, hash aggregation, and related optimizations. The hash operator family is essential here
since it identifies the hash function(s) to use.
35.14.7. Ordering Operators
Some index access methods (currently, only GiST) support the concept of ordering operators. What
we have been discussing so far are search operators. A search operator is one for which the index
can be searched to find all rows satisfying
WHERE indexed_column operator constant
.Note
that nothing is promised about the order in which the matching rows will be returned. In contrast,
an ordering operator does not restrict the set of rows that can be returned, but instead determines
their order. An ordering operator is one for which the index can be scanned to return rows in the
order represented by
ORDER BY indexed_column operator constant
.The reason for defining
ordering operators that way is that it supports nearest-neighbor searches, if the operator is one that
measures distance. For example, a query like
SELECT
*
FROM places ORDER BY location <-> point ’(101,456)’ LIMIT 10;
finds the ten places closest to a given target point. A GiST index on the location column can do this
efficiently because
<->
is an ordering operator.
While search operators have to return Boolean results, ordering operators usually return some other
type, such as float or numeric for distances. This type is normally not the same as the data type being
indexed. To avoid hard-wiringassumptions about the behavior of differentdata types, thedefinitionof
an ordering operator is required toname a B-tree operator familythatspecifies the sort ordering of the
result data type. As was stated in the previous section, B-tree operator families define PostgreSQL’s
notion of ordering, so this is a natural representation. Since the point
<->
operator returns
float8
,
it could be specified in an operator class creation command like this:
974
73
Chapter 35. Extending SQL
OPERATOR 15
<-> (point, point) FOR ORDER BY float_ops
where
float_ops
is the built-inoperator family thatincludes operations on
float8
.This declaration
states that the index is able to return rows inorder of increasing values of the
<->
operator.
35.14.8. Special Features of Operator Classes
There are twospecialfeatures of operator classes that we have not discussed yet, mainlybecause they
are not useful with the most commonly usedindex methods.
Normally, declaring an operator as a member of an operator class (or family) means that the index
method can retrieve exactly the set of rows that satisfy a
WHERE
condition using the operator. For
example:
SELECT
*
FROM table WHERE integer_column < 4;
can be satisfied exactly by a B-tree index on the integer column. But there are cases where an index
is useful as an inexact guide to the matching rows. For example, if a GiST index stores only bound-
ing boxes for geometric objects, then it cannot exactly satisfy a
WHERE
condition that tests overlap
between nonrectangular objects such as polygons. Yet we could use the index to find objects whose
bounding box overlaps the bounding box of the target object, and then do the exact overlap test only
on the objects found by the index. If this scenarioapplies, the index is saidto be “lossy” for the oper-
ator. Lossy index searches are implemented by having the index method return a recheck flag when a
row might or might not really satisfy the query condition. The core system will then test the original
query condition on the retrieved row to see whether it should be returned as a valid match. This ap-
proach works if the index is guaranteed to return all the required rows, plus perhaps some additional
rows, whichcanbe eliminated by performingtheoriginaloperator invocation. Theindexmethods that
support lossy searches (currently, GiST, SP-GiST and GIN) allow the support functions of individual
operator classes to set the recheck flag, and so this is essentially an operator-class feature.
Consider again the situation where we are storing in the index only the bounding box of a complex
object suchas a polygon. In this case there’s not much value instoringthewhole polygonin the index
entry — we might as well store just a simpler object of type
box
.This situation is expressed by the
STORAGE
option in
CREATE OPERATOR CLASS
:we’d write something like:
CREATE OPERATOR CLASS polygon_ops
DEFAULT FOR TYPE polygon USING gist AS
...
STORAGE box;
At present, only the GiST and GIN index methods support a
STORAGE
type that’s different from
the column data type. The GiST
compress
and
decompress
support routines must deal with data-
type conversion when
STORAGE
is used. In GIN, the
STORAGE
type identifies the type of the “key”
values, which normally is different from the type of the indexed column — for example, an operator
class for integer-array columns might have keys that are just integers. The GIN
extractValue
and
extractQuery
support routines are responsible for extracting keys from indexed values.
35.15. Packaging Related Objects into an Extension
Auseful extension to PostgreSQL typically includes multiple SQL objects; for example, a new data
type willrequire newfunctions, newoperators, and probably newindex operator classes. It is helpful
975
76
Chapter 35. Extending SQL
to collect all these objects into a single package to simplify database management. PostgreSQL calls
such a package an extension. To define an extension, you need at least a script file that contains the
SQL commandsto create the extension’s objects, and acontrolfile that specifies a fewbasic properties
of the extension itself. If the extension includes C code, there will typically also be a shared library
file intowhich the C code has been built. Once you have these files, a simple CREATE EXTENSION
command loads the objects into your database.
The main advantage of using an extension, rather than just running the SQL script to load a bunch
of “loose” objects into your database, is that PostgreSQL will then understand that the objects of the
extension go together. You can drop all the objects with a single DROP EXTENSION command (no
need to maintain a separate “uninstall” script). Even more useful, pg_dump knows that it should not
dump the individual member objects of the extension — it will just include a
CREATE EXTENSION
command in dumps, instead. This vastly simplifies migration to a new version of the extension that
might contain more or different objects than the old version. Note however that you must have the
extension’s control, script, and other files available when loading such a dump into a new database.
PostgreSQL will not let you drop an individual object contained in an extension, except by dropping
the whole extension. Also, while you can change the definition of an extension member object (for
example, via
CREATE OR REPLACE FUNCTION
for afunction), bear in mind thatthemodifieddefini-
tion willnotbedumpedbypg_dump. Such a change is usually onlysensible if you concurrentlymake
the same change in the extension’s script file. (But there are special provisions for tables containing
configuration data; see below.)
The extension mechanism also has provisions for packaging modification scripts that adjust the defi-
nitions of the SQL objects contained in an extension. For example, if version 1.1 of an extension adds
one function and changes the body of another function comparedto 1.0, the extension author can pro-
vide an update scriptthatmakes justthose twochanges. The
ALTER EXTENSION UPDATE
command
can then be used to apply these changes and track which version of the extension is actually installed
in a given database.
The kinds of SQL objects that can be members of an extension are shown in the description of AL-
TER EXTENSION. Notably, objects that are database-cluster-wide, such as databases, roles, and
tablespaces, cannot be extension members since an extension is only known within one database.
(Although an extension script is not prohibited from creating such objects, if it does so they will not
be tracked as part of the extension.) Also notice that while a table can be a member of an extension,
its subsidiary objects such as indexes are not directly considered members of the extension. Another
important point is that schemas can belong to extensions, but not vice versa: an extension as such
has an unqualified name and does not exist “within” any schema. The extension’s member objects,
however, will belong to schemas whenever appropriate for their object types. It may or may not be
appropriate for an extension to own the schema(s) its member objects are within.
35.15.1. Extension Files
The CREATE EXTENSION command relies on a control file for each extension, which must be
named the same as the extension with a suffix of
.control
,and must be placed in the installation’s
SHAREDIR/extension
directory. There must also be at least one SQL script file, which follows the
naming pattern
extension
--
version
.sql
(for example,
foo--1.0.sql
for version
1.0
of exten-
sion
foo
). By default, the script file(s) are also placed in the
SHAREDIR/extension
directory; but
the control file can specify a different directory for the script file(s).
The file format for an extension control file is the same as for the
postgresql.conf
file, namely a
list of
parameter_name = value
assignments, one per line. Blank lines and comments introduced
by
#
are allowed. Be sure to quote any value that is not a single word or number.
Acontrol file can set the following parameters:
976
113
Chapter 35. Extending SQL
directory
(
string
)
The directory containing the extension’s SQL script file(s). Unless an absolute path is given, the
name is relative to the installation’s
SHAREDIR
directory. The default behavior is equivalent to
specifying
directory = ’extension’
.
default_version
(
string
)
The default version of the extension (the one that will be installed if no version is specified in
CREATE EXTENSION
). Although this can be omitted, that will result in
CREATE EXTENSION
failing if no
VERSION
option appears, so you generally don’t want to do that.
comment
(
string
)
Acomment (any string) about the extension. Alternatively, the comment can be set by means of
the COMMENT command in the script file.
encoding
(
string
)
The character set encoding used by the script file(s). This should be specified if the script files
contain any non-ASCII characters. Otherwise the files will be assumed to be in the database
encoding.
module_pathname
(
string
)
The value of this parameter will be substituted for each occurrence of
MODULE_PATHNAME
in the script file(s). If it is not set, no substitution is made. Typically, this is set to
$libdir/
shared_library_name
and then
MODULE_PATHNAME
is used in
CREATE FUNCTION
commands for C-language functions, so that the script files do not need to hard-wire the name
of the shared library.
requires
(
string
)
Alist of names of extensions that this extension depends on, for example
requires = ’foo,
bar’
.Those extensions must be installed before this one can be installed.
superuser
(
boolean
)
If this parameter is
true
(whichisthedefault), onlysuperusers cancreatethe extensionor update
it to a new version. If it is set to
false
,just the privileges required to execute the commands in
the installation or update script are required.
relocatable
(
boolean
)
Anextensionis relocatable if it is possible tomove its contained objects intoa different schema
after initial creation of the extension. The default is
false
,i.e. the extension is not relocatable.
See below for more information.
schema
(
string
)
This parameter canonlybe setfor non-relocatableextensions. Itforces theextensiontobe loaded
into exactly the named schema and not any other. See below for more information.
In additionto the primarycontrol file
extension
.control
,an extension can have secondarycontrol
files named in the style
extension
--
version
.control
.If supplied, these must be located in the
script file directory. Secondary control files follow the same format as the primary control file. Any
parameters setin a secondary controlfile override theprimary controlfile when installing or updating
to thatversion of the extension. However, the parameters
directory
and
default_version
cannot
be set in a secondary control file.
An extension’s SQL script files can contain any SQL commands, except for transaction control com-
mands (
BEGIN
,
COMMIT
,etc) and commands that cannot be executed inside a transaction block (such
as
VACUUM
). This is because the script files are implicitly executed within a transaction block.
977
Documents you may be interested
Documents you may be interested