38
Chapter 6. Data Manipulation
The previous chapter discussed how to create tables and other structures to hold your data. Now it is
time to fill the tables with data. This chapter covers how to insert, update, and delete table data. The
chapter after this will finally explainhow to extract your long-lost data from the database.
6.1. Inserting Data
When a table is created, it contains no data. The first thing to do before a database can be of muchuse
is to insert data. Data is conceptually inserted one row at a time. Of course you can also insert more
than one row, but there is no way to insert less than one row. Even if you know only some column
values, a complete row must be created.
To create a new row, use the INSERT command. The command requires the table name and column
values. For example, consider the products table from Chapter 5:
CREATE TABLE products (
product_no integer,
name text,
price numeric
);
An example command to insert a row would be:
INSERT INTO products VALUES (1, ’Cheese’, 9.99);
The data values are listed in the order inwhich the columns appear in the table, separatedby commas.
Usually, the data values will be literals (constants), but scalar expressions are also allowed.
The above syntax has the drawback that you need to know the order of the columns in the table. To
avoidthis youcan also listthecolumns explicitly. For example, both of the following commands have
the same effect as the one above:
INSERT INTO products (product_no, name, price) VALUES (1, ’Cheese’, 9.99);
INSERT INTO products (name, price, product_no) VALUES (’Cheese’, 9.99, 1);
Many users consider it good practice to always list the column names.
If you don’t have values for all the columns, you can omit some of them. In that case, the columns
will be filled with their default values. For example:
INSERT INTO products (product_no, name) VALUES (1, ’Cheese’);
INSERT INTO products VALUES (1, ’Cheese’);
The second form is a PostgreSQL extension. It fills the columns from the left with as many values as
are given, and the rest will be defaulted.
For clarity, you can also request defaultvalues explicitly, for individual columns or for the entire row:
INSERT INTO products (product_no, name, price) VALUES (1, ’Cheese’, DEFAULT);
INSERT INTO products DEFAULT VALUES;
You can insert multiple rows in a single command:
INSERT INTO products (product_no, name, price) VALUES
81
54
Chapter 6. Data Manipulation
(1, ’Cheese’, 9.99),
(2, ’Bread’, 1.99),
(3, ’Milk’, 2.99);
Tip: When inserting a lot of data at the same time, considering using the COPY command. It
is not as flexible as the INSERT command, but is more efficient. Refer to Section 14.4 for more
information on improving bulk loading performance.
6.2. Updating Data
The modification of data that is already in the database is referred to as updating. You can update
individual rows, alltherows in atable, or a subset of all rows. Each columncanbe updated separately;
the other columns are not affected.
To update existing rows, use the UPDATE command. This requires three pieces of information:
1. The name of the table and column to update
2. The new value of the column
3. Which row(s) to update
Recallfrom Chapter 5 that SQL does not, in general, provide a unique identifier for rows. Therefore it
is not always possible to directly specify which row to update. Instead, you specify which conditions
arow must meet in order to be updated. Only if you have a primary key in the table (independent of
whether you declared it or not) can you reliably address individual rows by choosing a condition that
matches the primarykey. Graphical database access tools rely onthis factto allow you toupdate rows
individually.
For example, this command updates all products that have a price of 5 to have a price of 10:
UPDATE products SET price = 10 WHERE price = 5;
This might cause zero, one, or many rows to be updated. It is not an error to attempt an update that
does not match any rows.
Let’s look at that command in detail. First is the key word
UPDATE
followed by the table name. As
usual, the table name can be schema-qualified, otherwise it is looked up in the path. Next is the key
word
SET
followed by the column name, an equal sign, and the new column value. The new column
value can be any scalar expression, not just a constant. For example, if you want to raise the price of
all products by 10% you could use:
UPDATE products SET price = price
*
1.10;
As you see, the expression for the new value can refer to the existing value(s) in the row. We also
left out the
WHERE
clause. If it is omitted, it means that all rows in the table are updated. If it is
present, only those rows that match the
WHERE
condition are updated. Note that the equals sign in
the
SET
clause is an assignment while the one in the
WHERE
clause is a comparison, but this does not
create any ambiguity. Of course, the
WHERE
conditiondoes not have to be an equalitytest. Many other
operators are available (see Chapter 9). But the expression needs to evaluate to a Boolean result.
82
25
Chapter 6. Data Manipulation
You can update more than one column in an
UPDATE
command by listing more than one assignment
in the
SET
clause. For example:
UPDATE mytable SET a = 5, b = 3, c = 1 WHERE a > 0;
6.3. Deleting Data
So far we have explained how to add data to tables and how to change data. What remains is to
discuss how to remove data that is no longer needed. Just as adding data is only possible in whole
rows, you can only remove entire rows from a table. In the previous section we explained that SQL
does not provide a way to directly address individual rows. Therefore, removing rows can only be
done by specifying conditions that the rows to be removed have to match. If you have a primary key
in the table then you can specify the exact row. But you can also remove groups of rows matching a
condition, or you can remove all rows in the table at once.
You use the DELETE command to remove rows; the syntax is very similar to the
UPDATE
command.
For instance, to remove all rows from the products table that have a price of 10, use:
DELETE FROM products WHERE price = 10;
If you simply write:
DELETE FROM products;
then all rows in the table will be deleted! Caveat programmer.
83
87
Chapter 7. Queries
The previous chapters explained how to create tables, how to fill them with data, andhow to manipu-
late that data. Now we finally discuss how to retrieve the data from the database.
7.1. Overview
The process of retrieving or the command to retrieve data from a database is called a query. In SQL
the SELECT command is used to specify queries. The general syntax of the
SELECT
command is
[WITH
with_queries
] SELECT
select_list
FROM
table_expression
[
sort_specification
]
The following sections describe the details of the select list, the table expression, and the sort specifi-
cation.
WITH
queries are treated last since they are an advanced feature.
Asimple kind of query has the form:
SELECT
*
FROM table1;
Assuming that there is a table called
table1
,this command would retrieve all rows and all user-
defined columns from
table1
.(The methodof retrieval depends onthe client application. For exam-
ple, the psql program will display an ASCII-art table on the screen, while client libraries will offer
functions to extract individual values from the query result.) The select list specification
*
means all
columns thatthe table expression happens toprovide. Aselectlist can alsoselecta subsetof theavail-
able columns or makecalculations using the columns. For example, if
table1
has columns named
a
,
b
,and
c
(and perhaps others) you can make the following query:
SELECT a, b + c FROM table1;
(assuming that
b
and
c
are of a numerical data type). See Section 7.3 for more details.
FROM table1
isa simple kindof tableexpression:itreads just onetable. Ingeneral, tableexpressions
can be complex constructs of base tables, joins, and subqueries. But you can also omit the table
expression entirely and use the
SELECT
command as a calculator:
SELECT 3
*
4;
This is more useful if the expressions in the select list return varying results. For example, you could
call a function this way:
SELECT random();
7.2. Table Expressions
Atable expression computes a table. The table expression contains a
FROM
clause that is optionally
followed by
WHERE
,
GROUP BY
,and
HAVING
clauses. Trivial table expressions simply refer to a table
on disk, a so-called base table, but more complex expressions can be used to modify or combine base
tables in various ways.
The optional
WHERE
,
GROUP BY
,and
HAVING
clauses in the table expression specify a pipeline of
successive transformations performed on the table derived in the
FROM
clause. All these transforma-
84
127
Chapter 7. Queries
tions produce a virtual table that provides the rows that are passed to the select list to compute the
output rows of the query.
7.2.1. The
FROM
Clause
The FROM Clause derives a table from one or more other tables given in a comma-separated table
reference list.
FROM
table_reference
[,
table_reference
[, ...]]
Atable reference can be a table name (possibly schema-qualified), or a derived table such as a sub-
query, a
JOIN
construct, or complex combinations of these. If more than one table reference is listed
in the
FROM
clause, the tables are cross-joined (that is, the Cartesian product of their rows is formed;
see below). The result of the
FROM
list is an intermediate virtual table that can then be subject to
transformations by the
WHERE
,
GROUP BY
,and
HAVING
clauses and is finally the result of the overall
table expression.
When a table reference names a table that is the parent of a table inheritance hierarchy, the table
reference produces rows of not only that table but all of its descendant tables, unless the key word
ONLY
precedes the table name. However, the reference produces only the columns that appear in the
named table — any columns added in subtables are ignored.
Instead of writing
ONLY
before the table name, you can write
*
after the table name to explicitly
specify that descendant tables are included. Writing
*
is not necessary since that behavior is the
default (unless you have changed the setting of the sql_inheritance configuration option). However
writing
*
might be useful to emphasize that additional tables will be searched.
7.2.1.1. Joined Tables
Ajoined table is a table derived from two other (real or derived) tables according to the rules of the
particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is
T1 join_type T2
[
join_condition
]
Joins of all types can be chained together, or nested: either or both
T1
and
T2
can be joined tables.
Parentheses can be used around
JOIN
clauses to control the join order. In the absence of parentheses,
JOIN
clauses nest left-to-right.
Join Types
Cross join
T1
CROSS JOIN
T2
For every possible combination of rows from
T1
and
T2
(i.e., a Cartesian product), the joined
table will contain a row consisting of all columns in
T1
followed by all columns in
T2
.If the
tables have N and M rows respectively, the joined table will have N * M rows.
FROM
T1
CROSS JOIN
T2
is equivalent to
FROM
T1
INNER JOIN
T2
ON TRUE
(see below).
It is also equivalent to
FROM
T1
,
T2
.
Note: This latter equivalence does not hold exactly when more than two tables appear, be-
cause
JOIN
binds more tightly than comma. For example
FROM
T1
CROSS JOIN
T2
INNER
JOIN
T3
ON
condition
is not the same as
FROM
T1
,
T2
INNER JOIN
T3
ON
condition
be-
cause the
condition
can reference
T1
in the first case but not the second.
85
108
Chapter 7. Queries
Qualified joins
T1
{ [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN
T2
ON
boolean_expression
T1
{ [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN
T2
USING (
join column list
)
T1
NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN
T2
The words
INNER
and
OUTER
are optional in all forms.
INNER
is the default;
LEFT
,
RIGHT
,and
FULL
imply an outer join.
The join condition is specified in the
ON
or
USING
clause, or implicitly by the word
NATURAL
.
The joincondition determines whichrows from the two source tables are consideredto “match”,
as explained in detail below.
The possible types of qualified join are:
INNER JOIN
For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join
condition with R1.
LEFT OUTER JOIN
First, an inner join is performed. Then, for each row in T1 that does not satisfy the join
condition with any row in T2, a joined row is added with null values in columns of T2.
Thus, the joined table always has at least one row for each row in T1.
RIGHT OUTER JOIN
First, an inner join is performed. Then, for each row in T2 that does not satisfy the join
condition withanyrow inT1, a joined row is added withnull values in columns of T1. This
is the converse of a left join: the result table will always have a row for each row in T2.
FULL OUTER JOIN
First, an inner join is performed. Then, for each row in T1 that does not satisfy the join
condition with any row in T2, a joined row is added with null values in columns of T2.
Also, for each row of T2 that does not satisfy the join condition with any row in T1, a
joined row with null values in the columns of T1 is added.
The
ON
clause is the most general kind of join condition: it takes a Boolean value expression
of the same kind as is used in a
WHERE
clause. A pair of rows from
T1
and
T2
match if the
ON
expression evaluates to true.
The
USING
clauseis a shorthandthat allows you totake advantageof the specific situation where
both sides of the join use the same name for the joining column(s). It takes a comma-separated
listof the shared column names and forms a join condition that includes an equality comparison
for each one. For example, joining
T1
and
T2
with
USING (a, b)
produces the join condition
ON
T1
.a =
T2
.a AND
T1
.b =
T2
.b
.
Furthermore, the output of
JOIN USING
suppresses redundantcolumns: there is no need to print
both of the matched columns, since they must have equal values. While
JOIN ON
produces all
columns from
T1
followed by all columns from
T2
,
JOIN USING
produces one output column
for each of the listed column pairs (inthe listedorder), followed byanyremainingcolumns from
T1
,followed by any remaining columns from
T2
.
86
97
Chapter 7. Queries
Finally,
NATURAL
is a shorthand form of
USING
:it forms a
USING
list consisting of all column
names that appear in both input tables. As with
USING
,these columns appear only once in the
output table. If there are no common column names,
NATURAL
behaves like
CROSS JOIN
.
Note:
USING
is reasonably safe from column changes in the joined relations since only the
listedcolumnsarecombined.
NATURAL
is considerablymoreriskysinceanyschemachanges
to either relation that cause a new matching column name to be present will cause the join
to combine that new column as well.
To put this together, assume we have tables
t1
:
num | name
-----+------
1 | a
2 | b
3 | c
and
t2
:
num | value
-----+-------
1 | xxx
3 | yyy
5 | zzz
then we get the following results for the various joins:
=> SELECT
*
FROM t1 CROSS JOIN t2;
num | name | num | value
-----+------+-----+-------
1 | a
|
1 | xxx
1 | a
|
3 | yyy
1 | a
|
5 | zzz
2 | b
|
1 | xxx
2 | b
|
3 | yyy
2 | b
|
5 | zzz
3 | c
|
1 | xxx
3 | c
|
3 | yyy
3 | c
|
5 | zzz
(9 rows)
=> SELECT
*
FROM t1 INNER JOIN t2 ON t1.num = t2.num;
num | name | num | value
-----+------+-----+-------
1 | a
|
1 | xxx
3 | c
|
3 | yyy
(2 rows)
=> SELECT
*
FROM t1 INNER JOIN t2 USING (num);
num | name | value
-----+------+-------
1 | a
| xxx
3 | c
| yyy
(2 rows)
87
Documents you may be interested
Documents you may be interested