dwww Home | Manual pages | Find package

swish++.conf(5)               File Formats Manual              swish++.conf(5)

NAME
       swish++.conf - SWISH++ configuration file format

DESCRIPTION
       The  configuration  file format used by SWISH++ consists of three types
       of lines: blank lines, comments, and variable definitions.

   Blank lines
       Blank lines, or lines consisting entirely of whitespace, are ignored.

   Comments
       Comments start with the # character and continue up  to  and  including
       the  end  of the line.  While leading whitespace is permitted, comments
       are treated as such only if they are on lines by themselves.

   Variable definitions
       Variable definition lines are of the form:

            variable_name argument(s)

       where variable_name is a member of one of the types  described  in  the
       remaining  sections,  and  argument(s)  are  specific to every variable
       name.  For variable_name, case is irrelevant.

   Boolean variables
       Variables of this type take one argument that must be one of: f, false,
       n, no, off, on, t, true, y, or yes.  Case is irrelevant.  Variables  of
       this  type are: AssociateMeta, ExtractFilter, FollowLinks, Incremental,
       LaunchdCooperation, RecurseSubdirs,  SearchBackground,  StemWords,  and
       StoreWordPositions.

   Enumeration variables
       Variables  of  this type are just like string variables (see below) ex-
       cept that the argument must be one of a set of  pre-determined  values.
       Case  is  irrelevant.   Variables  of  this type are: ResultsFormat and
       SearchDaemon.  ResultsFormat must be either: classic or  XML.   Search-
       Daemon must be one of: none, tcp, unix, or both.

   Filter variables
       Variables of this type are of the form:

            pattern command

       where  pattern  is  a shell pattern (regular expression) and command is
       the command-line to execute the filter.

       Within a command, there are a few % substitutions that are done at run-
       time:

            b    Basename of filename.
            B    Basename minus last extension.
            e    Extension of filename.
            E    Second-to-last extension of filename.
            f    Entire filename.
            F    Filename minus last extension.

       That is: the % and one character immediately after it  are  substituted
       as  described  in  the  above table.  Substituted filenames are skipped
       past and not rescanned for more substitutions, but the remainder of the
       command is.  To use a literal % or @, simply double it.  (For  more  on
       filter variables, see FILTERS below.)

       Variables of this type are: FilterAttachment and FilterFile.

   Integer variables
       Variables  of this type take one numeric argument.  A special string of
       infinity is taken to mean ``the largest possible integer value.''  Case
       is irrelevant.  Variables of this type are:  FilesReserve,  ResultsMax,
       SocketQueueSize,  SocketTimeout, ThreadsMax, ThreadsMin, ThreadTimeout,
       TitleLines, Verbosity,  WordFilesMax,  WordPercentMax,  WordsNear,  and
       WordThreshold.

       For  WordThreshold, only the super-user can specify a value larger than
       the compiled-in default.

   Percentage variables
       Variables of this type are like integer variables except that if an op-
       tional trailing percent sign (%) is present, the value is taken to be a
       percentage rather than an absolute number.  Variables of this type are:
       FilesGrow.

   String variables
       Variables of this type take one argument that is the remainder  of  the
       line  minus  leading  and trailing whitespace.  To preserve whitespace,
       surround the argument in either single or double  quotes.   The  quotes
       themselves  are  stripped  from  the  argument, but only if they match.
       Variables of this type are: ExtractExtension,  Group,  IndexFile,  Pid-
       File,  ResultSeparator,  SocketFile,  StopWordFile,  TempDirectory, and
       User.

   Set variables
       Variables of this type take one or more arguments separated  by  white-
       space.  Variables of this type are: ExcludeClass, ExcludeFile, Extract-
       File, and ExcludeMeta.

   Other variables
       Variables  of  this  type  are: IncludeFile, IncludeMeta, and SocketAd-
       dress.

       An IncludeFile configuration file line is of the form:

            module_name pattern ...

       where module_name is the name of the module  (case  is  irrelevant)  to
       handle the indexing of the filename patterns that follow.  Module names
       are:  text  (plain  text), HTML (HTML and XHTML), ID3 (ID3 tags), LaTeX
       (LaTeX source), Mail (mail and news messages), Man (Unix manual pages),
       and RTF (Rich Text Format).

       An IncludeMeta configuration file line is of the form:

            name[=new_name] ...

       It is like a set variable except arguments may optionally  be  followed
       by reassignments.  For example, a value of:

            adr=address

       says  to include and index the words associated with the meta name adr,
       but to store the name as address in the generated index  file  so  that
       queries would use address rather than adr.

       A SocketAddress configuration file line is of the form:

            [ host : ] port

       that  is:  an  optional  host and colon followed by a port number.  The
       host may be one of a host name, an IPv4 address (in  dot-decimal  nota-
       tion),  an IPv6 address (in colon notation) if supported by the operat-
       ing system, or the * character meaning ``any  IP  address.''   Omitting
       the host and colon also means ``any IP address.''

FILTERS
   Filtering files
       Via the FilterFile configuration file variable, files matching patterns
       can  be  filtered prior to indexing or extraction.  For example, to un-
       compress bzip2'd, gzip'd, and compress'd files prior to indexing or ex-
       traction, the FilterFile variable lines in a configuration  file  would
       be:

            FilterFile *.bz2  bunzip2 -c %f > @%F
            FilterFile *.gz   gunzip -c %f > @%F
            FilterFile *.Z    uncompress -c %f > @%F

       Given  that,  a  filename  such as foo.txt.gz would become foo.txt.  If
       files having txt extensions should be indexed, then it will  be.   Note
       that the command on the FilterFile line must not simply be:

            gunzip @%f                    # WRONG!

       because  gunzip  will replace the compressed file with the uncompressed
       one.

       Here's an example to convert PDF to plain text for indexing  using  the
       xpdf(1) package's pdftotext command:

            FilterFile *.pdf  pdftotext %f @%F.txt

       A  file can be filtered more than once prior to indexing or extraction,
       i.e., filters can be ``chained'' together.  For example, if the  uncom-
       pression and PDF examples shown above are used together, compressed PDF
       files  will  also  be indexed or extracted, i.e., filenames ending with
       one of .pdf.bz2, .pdf.gz, or .pdf.Z double extensions.

       Note, however, that just because a filename has an extension for  which
       a  filter has been specified does not mean that a file will be filtered
       and subsequently indexed or extracted.  When index++ or  extract++  en-
       counters  a file having an extension for which a filter has been speci-
       fied, it performs the filename substitution(s) on it first to determine
       what the target filename would be.  If the extension of  that  filename
       should  be  indexed or extracted (because it is among the set of exten-
       sions specified with either the -e or --pattern options or the Include-
       File variable or is not among the set specified with either the  -E  or
       --no-pattern  options  or the ExcludeFile variable), then the filter(s)
       are executed to create it.

   Filtering attachments
       Via the FilterAttachment configuration file  variable,  e-mail  attach-
       ments  whose  MIME  types match particular patterns can be filtered and
       thus indexed.  An attachment is written to a temporary file  by  itself
       (after  having been base-64 decoded, if necessary) and a filter command
       is called on that file.

       For example, to convert a PDF attachment to plain text so it can be in-
       dexed, the FilterAttachment variable line in a configuration file would
       be:

            FilterAttachment application/pdf    pdftotext %f @%F.txt

       MIME types must be specified entirely in lower case.  Patterns  can  be
       useful for MIME types.  For example:

            FilterAttachment application/*word  extract++ -f %f > @%F.txt

       can  be  used regardless of whether the MIME type is application/msword
       (the official MIME type  for  Microsoft  Word  documents)  or  applica-
       tion/vnd.ms-word (an older version).

       The MIME types that are built into index++(1) are: text/plain, text/en-
       riched (but only if the RTF module is compiled in), text/html (but only
       if the HTML module is compiled in), text/*vcard, message/rfc822, multi-
       part/something  (where something is one of: alternative, mixed, or par-
       allel).  FilterAttachment variable lines can override the  handling  of
       the built-in MIME types.

       Unlike  file filters, attachment filters must convert directly to plain
       text and can not be ``chained'' together.  (This restriction exists be-
       cause there is no way to know what any intermediate MIME types would be
       to apply more filters.)

SEE ALSO
       bzip(1), compress(1),  extract++(1),  gunzip(1),  gzip(1),  index++(1),
       pdftotext(1), search++(1), uncompress(1), glob(7)

       Nathaniel  S. Borenstein.  ``The text/enriched MIME Content-type,'' Re-
       quest for Comments 1563, Network Working Group of  the  Internet  Engi-
       neering Task Force, January 1994.

       David H. Crocker.  ``Standard for the Format of ARPA Internet Text Mes-
       sages,''  Request  for Comments 822, Department of Electrical Engineer-
       ing, University of Delaware, August 1982.

       Frank Dawson and Tim Howes.  ``vCard MIME Directory Profile,''  Request
       for  Comments  2426,  Network Working Group of the Internet Engineering
       Task Force, September 1998.

       Ned Freed and Nathaniel S. Borenstein.   ``Multipurpose  Internet  Mail
       Extensions  (MIME)  Part  One: Format of Internet Message Bodies,'' Re-
       quest for Comments 2045, RFC 822 Extensions Working Group of the Inter-
       net Engineering Task Force, November 1996.

       International Standards Organization.   ``ISO/IEC  9945-2:  Information
       Technology  --  Portable  Operating System Interface (POSIX) -- Part 2:
       Shell and Utilities,'' 1993.

       Steven Pemberton, et al.  XHTML 1.0: The  Extensible  HyperText  Markup
       Language, World Wide Web Consortium, January 2000.

AUTHOR
       Paul J. Lucas <pauljlucas@mac.com>

SWISH++                          June 16, 2005                 swish++.conf(5)

Generated by dwww version 1.16 on Tue Dec 16 06:00:54 CET 2025.