dwww Home | Manual pages | Find package

WWW(3)                     Library Functions Manual                     WWW(3)

NAME
       WWW - World Wide Web Package

SYNOPSIS
       extract_description( FILE )
       extract_meta( FILE, NAME )
       hyperlink( LIST )

DESCRIPTION
       This package provides a utility functions for the World Wide Web to ex-
       tract  descriptions  of  or  meta information from files, and hyperlink
       text.

SUBROUTINES
       The following Perl subroutines are defined and available:

       extract_description( FILE )
              Extracts a description from an HTML or plain text file given  by
              the  FILE name; FILE should be an absolute path.  The first $de-
              scription::chars (default: 2048) characters are  read.   If  the
              file  ends  in  one of the extensions htm, html, or shtml, it is
              presumed to be an HTML file; if the file ends in txt, it is pre-
              sumed to be a plain text file.  Other extensions are not  recog-
              nized and no description is returned for them.

              For  HTML  files,  first,  if  a  <META  NAME="description" CON-
              TENT="...">  or  a  <META  NAME="DC.description"  CONTENT="...">
              (Dublin  Core) element is found, then the words specified as the
              value of the CONTENT attribute is returned as the description.

              Otherwise, all HTML comments, text  between  <SCRIPT>,  <STYLE>,
              and  <TITLE>  tags,  and  all  other HTML tags are stripped.  If
              <AREA ... ALT="..."> or <IMG ... ALT="..."> elements are  found,
              then  the words specified as the value of the ALT attributes are
              extracted.

              Finally, for either HTML or plain text files, at most  $descrip-
              tion::words (default: 50) are returned.

       extract_meta( FILE, NAME )
              Extracts  the value of the CONTENT attribute from a META element
              having the given NAME attribute from an HTML file given  by  the
              FILE  name;  FILE should be an absolute path.  The file must end
              in one of the extensions htm, html, or shtml to be considered an
              HTML file.  The first $description::chars (default: 2048)  char-
              acters  are read.  The characters are cached between consecutive
              calls using the same filename.

       hyperlink( LIST )
              Adds hyperlinks to strings: that is strings  that  contain  sub-
              strings that are valid URLs (according to RFC 1630) have the ap-
              propriate HTML tags ``wrapped'' around them so that they will be
              selectable  when displayed in a browser.  The ftp, gopher, http,
              https, mailto, news, telnet, and wais URLs are recognized.   Ex-
              ample:

                 Read all about it at
                 http://www.usatoday.com/

            becomes:

                 Read all about it at
                 <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>

SEE ALSO
       perl(1)

       Tim  Berners-Lee.   ``Universal  Resource Identifiers in WWW,'' Request
       for Comments 1630, Network Working Group of  the  Internet  Engineering
       Task Force, June 1994.

       Tim Berners-Lee, Larry Masinter, and Mark McCahill.  ``Uniform Resource
       Locators  (URL),''  Request  for  Comments 1738, Network Working Group,
       1994.

       Dave Raggett, Arnaud Le Hors,  and  Ian  Jacobs.   ``Notes  on  helping
       search  engines index your Web site,'' HTML 4.0 Specification, Appendix
       B: Performance, Implementation, and Design Notes, World Wide  Web  Con-
       sortium, April 1998.

       --.   ``Objects,  Images, and Applets: How to specify alternate text,''
       HTML 4.0 Specification, ยง13.8, World Wide Web Consortium, April 1998.

       Dublin Core Directorate.  ``The Dublin Core: A Simple Content  Descrip-
       tion Model for Electronic Resources.''

       Larry  Wall,  et al.  Programming Perl, 3rd ed., O'Reilly & Associates,
       Inc., Sebastopol, CA, 2000.

AUTHOR
       Paul J. Lucas <pauljlucas@mac.com>

WWW                            February 12, 2000                        WWW(3)

Generated by dwww version 1.16 on Tue Dec 16 15:14:22 CET 2025.