84
9.1 Introduction
The genius of PHP is its seamless integration of form variables into your programs. It makes
web programming smooth and simple, from web form to PHP code to HTML output.
There's no built-in mechanism in HTTP to allow you to save information from one page so you
can access it in other pages. That's because HTTP is a stateless protocol. Recipe 9.2
, Recipe
9.4
, Recipe 9.5
, and Recipe 9.6
all show ways to work around the fundamental problem of
figuring out which user is making which requests to your web server.
Processing data from the user is the other main topic of this chapter. You should never trust
the data coming from the browser, so it's imperative to always validate all fields, even hidden
form elements. Validation takes many forms, from ensuring the data match certain criteria, as
discussed in Recipe 9.3
, to escaping HTML entities to allow the safe display of user entered
data, as covered in Recipe 9.9
. Furthermore, Recipe 9.8
tells how to protect the security of
your web server, and Recipe 9.7
covers how to process files uploaded by a user.
Whenever PHP processes a page, it checks for GET and POST form variables, uploaded files,
applicable cookies, and web server and environment variables. These are then directly
accessible in the following arrays:
$_GET
,
$_POST
,
$_FILES
,
$_COOKIE
,
$_SERVER
, and
$_ENV
. They hold, respectively, all variables set by GET requests, POST requests, uploaded
files, cookies, the web server, and the environment. There's also
$_REQUEST
, which is one
giant array that contains the values from the other six arrays.
When placing elements inside of
$_REQUEST
, if two arrays both have a key with the same
name, PHP falls back upon the
variables_order
configuration directive. By default,
variables_order
is
EGPCS
(or
GPCS
, if you're using the php.ini-recommended configuration
file). So, PHP first adds environment variables to
$_REQUEST
and then adds GET, POST,
cookie, and web server variables to the array, in this order. For instance, since
C
comes after
P
in the default order, a cookie named
username
overwrites a POST variable named
username
.
If you don't have access to PHP's configuration files, you can use
ini_get( )
to check a
setting:
print ini_get('variables_order');
EGPCS
You may need to do this because your ISP doesn't let you view configuration settings or
because your script may run on someone else's server. You can also use
phpinfo( )
to view
settings. However, if you can't rely on the value of
variables_order
, you should directly
access
$_GET
and
$_POST
instead of using
$_REQUEST
.
76
The arrays containing external variables, such as
$_REQUEST
, are superglobals. As such, they
don't need to be declared as
global
inside of a function or class. It also means you probably
shouldn't assign anything to these variables, or you'll overwrite the data stored in them.
Prior to PHP 4.1, these superglobal variables didn't exist. Instead there were regular arrays
named
$HTTP_COOKIE_VARS
,
$HTTP_ENV_VARS
,
$HTTP_GET_VARS
,
$HTTP_POST_VARS
,
$HTTP_POST_FILES
, and
$HTTP_SERVER_VARS
. These arrays are still available for legacy
reasons, but the newer arrays are easier to work with. These older arrays are populated only if
the
track_vars
configuration directive is
on
, but, as of PHP 4.0.3, this feature is always
enabled.
Finally, if the
register_globals
configuration directive is
on
, all these variables are also
available as variables in the global namespace. So,
$_GET['password']
is also just
$password
. While convenient, this introduces major security problems because malicious
users can easily set variables from the outside and overwrite trusted internal variables.
Starting with PHP 4.2,
register_globals
defaults to
off
.
With this knowledge, here is a basic script to put things together. The form asks the user to
enter his first name, then replies with a welcome message. The HTML for the form looks like
this:
<form action="/hello.php" method="post">
What is your first name?
<input type="text" name="first_name">
<input type="submit" value="Say Hello">
</form>
The
name
of the text
input
element inside the form is
first_name
. Also, the
method
of the
form is
post
. This means that when the form is submitted,
$_POST['first_name']
will
hold whatever string the user typed in. (It could also be empty, of course, if he didn't type
anything.)
For simplicity, however, let's assume the value in the variable is valid. (The term "valid" is
open for definition, depending on certain criteria, such as not being empty, not being an
attempt to break into the system, etc.) This allows us to omit the error checking stage, which
is important but gets in the way of this simple example. So, here is a simple hello.php script to
process the form:
echo 'Hello ' . $_POST['first_name'] . '!';
If the user's first name is Joe, PHP prints out:
Hello Joe!
44
Recipe 9.2 Processing Form Input
9.2.1 Problem
You want to use the same HTML page to emit a form and then process the data entered into
it. In other words, you're trying to avoid a proliferation of pages that each handle different
steps in a transaction.
9.2.2 Solution
Use a hidden field in the form to tell your program that it's supposed to be processing the
form. In this case, the hidden field is named
stage
and has a value of
process
:
if (isset($_POST['stage']) && ('process' == $_POST['stage'])) {
process_form();
} else {
print_form();
}
9.2.3 Discussion
During the early days of the Web, when people created forms, they made two pages: a static
HTML page with the form and a script that processed the form and returned a dynamically
generated response to the user. This was a little unwieldy, because form.html led to form.cgi
and if you changed one page, you needed to also remember to edit the other, or your script
might break.
Forms are easier to maintain when all parts live in the same file and context dictates which
sections to display. Use a hidden form field named
stage
to track your position in the flow of
the form process; it acts as a trigger for the steps that return the proper HTML to the user.
Sometimes, however, it's not possible to design your code to do this; for example, when your
form is processed by a script on someone else's server.
When writing the HTML for your form, however, don't hardcode the path to your page directly
into the
action
. This makes it impossible to rename or relocate your page without also
editing it. Instead, PHP supplies a helpful variable:
$_SERVER['PHP_SELF']
This variable is an alias to the URL of the current page. So, set the value of the
action
attribute to that value, and your form always resubmits, even if you've moved the file to a
new place on the server.
So, the example in the introduction of this chapter is now:
if (isset($_POST['stage']) && ('process' == $_POST['stage'])) {
process_form();
} else {
49
print_form();
}
function print_form() {
echo <<<END
<form action="$_SERVER[PHP_SELF]" method="post">
What is your first name?
<input type="text" name="first_name">
<input type="hidden" name="stage" value="process">
<input type="submit" value="Say Hello">
</form>
END;
}
function process_form() {
echo 'Hello ' . $_POST['first_name'] . '!';
}
If your form has more than one step, just set
stage
to a new value for each step.
9.2.4 See Also
Recipe 9.4
for handling multipage forms.
Recipe 9.3 Validating Form Input
9.3.1 Problem
You want to ensure data entered from a form passes certain criteria.
9.3.2 Solution
Create a function that takes a string to validate and returns
true
if the string passes a check
and
false
if it doesn't. Inside the function, use regular expressions and comparisons to check
the data. For example, Example 9-1
shows the
pc_validate_zipcode( )
function, which
validates a U.S. Zip Code.
Example 9-1. pc_validate_zipcode( )
function pc_validate_zipcode($zipcode) {
return preg_match('/^[0-9]{5}([- ]?[0-9]{4})?$/', $zipcode);
}
Here's how to use it:
if (pc_validate_zipcode($_REQUEST['zipcode'])) {
// U.S. Zip Code is okay, can proceed
process_data();
} else {
// this is not an okay Zip Code, print an error message
print "Your ZIP Code is should be 5 digits (or 9 digits, if you're ";
print "using ZIP+4).";
print_form();
41
}
9.3.3 Discussion
Deciding what constitutes valid and invalid data is almost more of a philosophical task than a
straightforward matter of following a series of fixed steps. In many cases, what may be
perfectly fine in one situation won't be correct in another.
The easiest check is making sure the field isn't blank. The
empty( )
function best handles
this problem.
Next come relatively easy checks, such as the case of a U.S. Zip Code. Usually, a regular
expression or two can solve these problems. For example:
/^[0-9]{5}([- ]?[0-9]{4})?$/
finds all valid U.S. Zip Codes.
Sometimes, however, coming up with the correct regular expression is difficult. If you want to
verify that someone has entered only two names, such as "Alfred Aho," you can check
against:
/^[A-Za-z]+ +[A-Za-z]+$/
However, Tim O'Reilly can't pass this test. An alternative is
/^\S+\s+\S+$/
; but then Donald
E. Knuth is rejected. So think carefully about the entire range of valid input before writing
your regular expression.
In some instances, even with regular expressions, it becomes difficult to check if the field is
legal. One particularly popular and tricky task is validating an email address, as discussed in
Recipe 13.7
. Another is how to make sure a user has correctly entered the name of her U.S.
state. You can check against a listing of names, but what if she enters her postal service
abbreviation? Will MA instead of Massachusetts work? What about Mass.?
One way to avoid this issue is to present the user with a dropdown list of pregenerated
choices. Using a
select
element, users are forced by the form's design to select a state in
the format that always works, which can reduce errors. This, however, presents another series
of difficulties. What if the user lives some place that isn't one of the choices? What if the range
of choices is so large this isn't a feasible solution?
There are a number of ways to solve these types of problems. First, you can provide an
"other" option in the list, so that a non-U.S. user can successfully complete the form.
(Otherwise, she'll probably just pick a place at random, so she can continue using your site.)
Next, you can divide the registration process into a two-part sequence. For a long list of
options, a user begins by picking the letter of the alphabet his choice begins with; then, a new
page provides him with a list containing only the choices beginning with that letter.
38
Finally, there are even trickier problems. What do you do when you want to make sure the
user has correctly entered information, but you don't want to tell her you did so? A situation
where this is important is a sweepstakes; in a sweepstakes, there's often a special code box
on the entry form in which a user enters a string —
AD78DQ
— from an email or flier she's
received. You want to make sure there are no typos, or your program won't count her as a
valid entrant. You also don't want to allow her to just guess codes, because then she could try
out those codes and crack the system.
The solution is to have two input boxes. A user enters her code twice; if the two fields match,
you accept the data as legal and then (silently) validate the data. If the fields don't match,
you reject the entry and have the user fix it. This procedure eliminates typos and doesn't
reveal how the code validation algorithm works; it can also prevent misspelled email
addresses.
Finally, PHP performs server-side validation. Server-side validation requires that a request be
made to the server, and a page returned in response; as a result, it can be slow. It's also
possible to do client-side validation using JavaScript. While client-side validation is faster, it
exposes your code to the user and may not work if the client doesn't support JavaScript or has
disabled it. Therefore, you should always duplicate all client-side validation code on the server.
9.3.4 See Also
Recipe 13.7
for a regular expression for validating email addresses; Chapter 7
, "Validation on
the Server and Client," of Web Database Applications with PHP and MySQL (Hugh Williams and
David Lane, O'Reilly).
Recipe 9.4 Working with Multipage Forms
9.4.1 Problem
You want to use a form that displays more than one page and preserve data from one page to
the next.
9.4.2 Solution
Use session tracking:
session_start();
$_SESSION['username'] = $_GET['username'];
You can also include variables from a form's earlier pages as hidden input fields in its later
pages:
<input type="hidden" name="username"
value="<?php echo htmlentities($_GET['username']); ?>">
9.4.3 Discussion
55
Whenever possible, use session tracking. It's more secure because users can't modify session
variables. To begin a session, call
session_start( )
; this creates a new session or resumes
an existing one. Note that this step is unnecessary if you've enabled
session.auto_start
in your php.ini file. Variables assigned to
$_SESSION
are automatically propagated. In the
Solution example, the form's username variable is preserved by assigning
$_GET['username']
to
$_SESSION['username']
.
To access this value on a subsequent request, call
session_start( )
and then check
$_SESSION['username']
:
session_start( );
$username = htmlentities($_SESSION['username']);
print "Hello $username.";
In this case, if you don't call
session_start( )
,
$_SESSION
isn't set.
Be sure to secure the server and location where your session files are located (the filesystem,
database, etc.); otherwise your system will be vulnerable to identity spoofing.
If session tracking isn't enabled for your PHP installation, you can use hidden form variables as
a replacement. However, passing data using hidden form elements isn't secure because
anyone can edit these fields and fake a request; with a little work, you can increase the
security to a reliable level.
The most basic way to use hidden fields is to include them inside your form.
<form action="<?php echo $_SERVER['PHP_SELF']; ?>"
method="get">
<input type="hidden" name="username"
value="<?php echo htmlentities($_GET['username']); ?>">
When this form is resubmitted,
$_GET['username']
holds its previous value unless someone
has modified it.
A more complex but secure solution is to convert your variables to a string using
serialize(
)
, compute a secret hash of the data, and place both pieces of information in the form. Then,
on the next request, validate the data and unserialize it. If it fails the validation test, you'll
know someone has tried to modify the information.
The
pc_encode( )
encoding function shown in Example 9-2
takes the data to encode in the
form of an array.
Example 9-2. pc_encode( )
$secret = 'Foo25bAr52baZ';
Documents you may be interested
Documents you may be interested