CGI Programming on the World Wide Web

Previous Chapter 10 Next
 

10.5 Checking Hypertext (HTTP) Links

If you look back at the guestbook example in Chapter 7, Advanced Form Applications, you will notice that one of the fields asked for the user's HTTP server. At that time, we did not discuss any methods to check if the address given by the user is valid. However, with our new knowledge of sockets and network communication, we can, indeed, determine the validity of the address. After all, web servers have to use the same Internet protocols as everyone else; they posses no magic. If we open a TCP/IP socket connection to a web server, we can pass it commands it recognizes, just as we passed a command to the finger daemon (server). Before we go any further, here is a small snippet of code from the guestbook that outputs the user-specified URL:

        if ($FORM{'www'}) {
            print GUESTBOOK <<End_of_Web_Address;
<P>
$FORM{'name'} can also be reached at:
<A HREF="$FORM{'www'}">$FORM{'www'}</A>
End_of_Web_Address
        }

Here is a subroutine that utilizes the socket library to check for valid URL addresses. It takes one argument, the URL to check.

sub check_url 
{
    local ($url) = @_;
    local ($current_host, $host, $service, $file, $first_line);
    if (($host, $service, $file) = 
        ($url =~ m|http://([^/:]+):{0,1}(\d*)(\S*)$|)) {

This regular expression parses the specified URL and retrieves the hostname, the port number (if included), and the file.

[Graphic: Figure from the text]

Let's continue with the program:

        chop ($current_host = `\bin\hostname`);
        $host = $current_host  if ($host eq "localhost");
        $service = "http"      unless ($service);
        $file = "/"            unless ($file);

If the hostname is given as "localhost", the current hostname is used. In addition, the service name and the file are set to "http", and "/", respectively, if no information was specified for these fields.

        &open_connection (HTTP, $host, $service) || return (0);   
        print HTTP "HEAD $file HTTP/1.0", "\n\n";

A socket is created, and a connection is attempted to the remote host. If it fails, an error status of zero is returned. If it succeeds, the HEAD command is issued to the HTTP server. If the specified document exists, the server returns something like this:

HTTP/1.0 200 OK
Date: Fri Nov  3 06:09:17 1995 GMT
Server: NCSA/1.4.2
MIME-version: 1.0
Content-type: text/html
Last-modified: Sat Feb  4 17:56:33 1995 GMT
Content-length: 486

All we are concerned about is the first line, which contains a status code. If the status code is 200, a success status of one is returned. If the document is protected, or does not exist, error codes of 401 and 404, respectively, are returned (see Chapter 3, Output from the Common Gateway Interface). Here is the code to check the status:

        chop ($first_line = <HTTP>);
        if ($first_line =~ /200/) {
            return (1);
        } else {
            return (0);
        }
        close (HTTP);
    } else {
        return (0);
    }
}

This is how you would use this subroutine in the guestbook:

        if ($FORM{'www'}) {
            &check_url ($FORM{'www'}) ||
                &return_error (500, "Guestbook File Error",
                "The specified URL does not exist. Please enter a valid URL.");
            print GUESTBOOK <<End_of_Web_Address;
<P>
$FORM{'name'} can also be reached at:
<A HREF="$FORM{'www'}">$FORM{'www'}</A>
End_of_Web_Address
        }

Now, let's look at an example that creates a gateway to the Archie server using pre-existing client software.


Previous Home Next
Socket Library Book Index Archie
 


Banner.Novgorod.Ru