[4mFsdb[24m(3)                User Contributed Perl Documentation               [4mFsdb[24m(3)

[1mNAME[0m
     Fsdb - a flat-text database for shell scripting

[1mSYNOPSIS[0m
     Fsdb, the flatfile streaming database is package of commands for manipulat-
     ing  flat-ASCII  databases  from  shell scripts.  Fsdb is useful to process
     medium amounts of data (with very little data you'd do  it  by  hand,  with
     megabytes  you  might want a real database).  Fsdb was known as as Jdb from
     1991 to Oct. 2008.

     Fsdb is very good at doing things like:

     *   extracting measurements from experimental output

     *   examining data to address different hypotheses

     *   joining data from different experiments

     *   eliminating/detecting outliers

     *   computing statistics on data (mean, confidence intervals, correlations,
         histograms)

     *   reformatting data for graphing programs

     Fsdb is built around the idea of a flat text  file  as  a  database.   Fsdb
     files  (by convention, with the extension [4m.fsdb[24m), have a header documenting
     the schema (what the columns mean), and then each line represents  a  data-
     base record (or row).

     For example:

             #fsdb experiment duration
             ufs_mab_sys 37.2
             ufs_mab_sys 37.3
             ufs_rcp_real 264.5
             ufs_rcp_real 277.9

     Is a simple file with four experiments (the rows), each with a description,
     size parameter, and run time in the first, second, and third columns.

     Rather  than  hand-code  scripts  to  do  each  special case, Fsdb provides
     higher-level functions.  Although it's often easy throw together  a  custom
     script  to  do any single task, I believe that there are several advantages
     to using Fsdb:

     *   these programs provide a higher level interface than plain Perl, so

         **  Fewer lines of simpler code:

                 dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration

             Picks out just one type of experiment and  computes  statistics  on
             it, rather than:

                 while (<>) { split; $sum+=$F[1]; $ss+=$F[1]**2; $n++; }
                 $mean = $sum / $n; $std_dev = ...

             in dozens of places.

     *   the library uses names for columns, so

         **  No more $F[1], use "_duration".

         **  New or different order columns?  No changes to your scripts!

         Thus if your experiment gets more complicated with a size parameter, so
         your log changes to:

                 #fsdb experiment size duration
                 ufs_mab_sys 1024 37.2
                 ufs_mab_sys 1024 37.3
                 ufs_rcp_real 1024 264.5
                 ufs_rcp_real 1024 277.9
                 ufs_mab_sys 2048 45.3
                 ufs_mab_sys 2048 44.2

         Then  the  previous scripts still work, even though duration is now the
         third column, not the second.

     *   A series of actions are self-documenting (the provenance of processsing
         done to produce each output is recorded in comments).

         **  No more wondering what hacks were used to compute the  final  data,
             just look at the comments at the end of the output.

         For example, the commands

             dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration

         add to the end of the output the lines
             #    | dbrow _experiment eq "ufs_mab_sys"
             #    | dbcolstats duration

     *   The  library  is  mature,  supporting large datasets (more than 100GB),
         parallelism, corner cases, error handling, backed by an automated  test
         suite.

         **  No  more  puzzling  about  bad  output  because  your custom script
             skimped on error checking.

         **  No more memory thrashing when you try to sort ten million records.

         **  Makes use of multiple cores in your computer when it  can,  because
             each  pipeline  component  runs  in parallel, and because key tools
             (dbsort, dbmapreduce) run in parallel when possible.

     *   Fsdb supports Perl scripting (in addition to shell scripting), with li-
         braries to do Fsdb input and output, and easy  support  for  pipelines.
         The shell script

             dbcol name test1 | dbroweval '_test1 += 5;'

         can be written in perl as:

             dbpipeline(dbcol(qw(name test1)), dbroweval('_test1 += 5;'));

     *   Fsdb-3.x  supports  optional  typing,  allowing  languages that require
         casting on import to do so automatically.

     (The disadvantage is that you need to learn what functions Fsdb provides.)

     Fsdb is built on flat-ASCII databases.  By  storing  data  in  simple  text
     files  and  processing  it  with pipelines it is easy to experiment (in the
     shell) and look at the output.  To the best of my knowledge,  the  original
     implementation  of  this idea was "/rdb", a commercial product described in
     the book [4mUNIX[24m [4mrelational[24m [4mdatabase[24m [4mmanagement:[24m  [4mapplication[24m  [4mdevelopment[24m  [4min[0m
     [4mthe[24m  [4mUNIX[24m  [4menvironment[24m  by  Rod  Manis, Evan Schaffer, and Robert Jorgensen
     (1988 by Prentice Hall, and also at the  web  page  <http://www.rdb.com/>).
     Fsdb  is an incompatible re-implementation of their idea without any accel-
     erated indexing or forms support.  (But it's free, and probably has  better
     statistics!).

     Fsdb-2.x will exploit multiple processors or cores, and provides Perl-level
     support  for input, output, and threaded-pipelines.  (As of Fsdb-2.44 it no
     longer uses Perl threading, just processes, since they are faster.)

     Installation instructions follow at the end of this document.  Fsdb-2.x re-
     quires Perl 5.8 to run.  All commands have manual pages and  provide  usage
     with  the  "--help"  option.   All commands are backed by an automated test
     suite.

     The  most  recent  version  of  Fsdb   is   available   on   the   web   at
     <http://www.isi.edu/~johnh/SOFTWARE/FSDB/index.html>.

[1mWHAT'S NEW[0m
   [1m3.16,  2026-06-08  Make  dbcoldefine  default to fill in null fields with the[0m
     [1mFsdb empty value.[0m
     ENHANCEMENT
         dbcoldefine now fills columns that have no value with  the  Fsdb  empty
         value.  Use "--no-fill-empty" to disable.

     ENHANCEMENT
         tcpdump_to_db now handles epoch-style time formats and UDP, and option-
         ally  splits  out  ports  with  "--ports".  (But: tcpdump is not really
         parsable; use tshark instead.)

[1mREADME CONTENTS[0m
     executive summary
     what's new
     README CONTENTS
     installation
     basic data format
     basic data manipulation
     list of commands
     another example
     a gradebook example
     a password example
     history
     related work
     release notes
     copyright
     comments

[1mINSTALLATION[0m
     Fsdb now uses the standard Perl build and installation from  [1mExtUtil::Make-[0m
     [1mMaker[22m(3), so the quick answer to installation is to type:

         perl Makefile.PL
         make
         make test
         sudo make install

     Or, if you want to install it somewhere else, change the first line to

         perl Makefile.PL PREFIX=$HOME

     then  the  other commands ("make; make test; make install"; but now without
     the sudo), and it will go in your home directory's [4mbin[24m,  etc.   (See  [1mExtU-[0m
     [1mtil::MakeMaker[22m(3) for more details.)

     Fsdb requires perl 5.8 or later.

     A test-suite is available, run it with

         make test

     In  the  past, the ports existed for FreeBSD and MacOS.  If someone running
     one of those OSes wants to contribute a new port, please let me know.

[1mBASIC DATA FORMAT[0m
     These programs are based on the idea storing data in simple ASCII files.  A
     database is a file with one header line and then  data  or  comment  lines.
     For example:

             #fsdb account passwd uid gid fullname homedir shell
             johnh * 2274 134 John_Heidemann /home/johnh /bin/bash
             greg * 2275 134 Greg_Johnson /home/greg /bin/bash
             root * 0 0 Root /root /bin/bash
             # this is a simple database

     The  header  line  must  be  first and begins with "#fsdb".  There are rows
     (records) and columns (fields), just like in a  normal  database.   Comment
     lines begin with "#".  Column names are any string not containing spaces or
     single  quote (although it is prudent to keep them alphanumeric with under-
     score).

     Columns can optionally include type anntations by following  name  with  :t
     where  t  is  some  type.  (Types are not used in Perl, but are relevant in
     Python and Go Fsdb bindings.)  Types use a subset of perl pack  specifiers:
     c,  s,  l,  q are signed 8, 16, 32, and 64-bit integers, f is a float, d is
     double float, a is utf-8 string, and &gt; and &lt; can force big or  little
     endianness.

     By  default,  columns are delimited by any amount of whitespace.  With this
     default configuration, the contents of a field cannot  contain  whitespace.
     However,  this limitation can be relaxed by changing the field separator as
     described below.

     The big advantage of simple flat-text databases is that it is usually  easy
     to massage data into this format, and it's reasonably easy to take data out
     of  this format into other (text-based) programs, like gnuplot, jgraph, and
     LaTeX.  Think Unix.  Think pipes.  (Or even output to Excel and HTML if you
     prefer.)

     Since no-whitespace in columns was a problem for some applications, there's
     an option which relaxes this rule.  You can specify the field separator  in
     the  table header with "-F x" where "x" is a code for the new field separa-
     tor.  A full list of codes is at [1mdbfilealter[22m(1),  but  two  common  special
     values  are  "-F t" which is a separator of a single tab character, and "-F
     S", a separator of two spaces.  Both allowing (single)  spaces  in  fields.
     An example:

             #fsdb -F S account passwd uid gid fullname homedir shell
             johnh  *  2274  134  John Heidemann  /home/johnh  /bin/bash
             greg  *  2275  134  Greg Johnson  /home/greg  /bin/bash
             root  *  0  0  Root  /root  /bin/bash
             # this is a simple database

     See [1mdbfilealter[22m(1) for more details.  Regardless of what the column separa-
     tor is for the body of the data, it's always whitespace in the header.

     There's  also  a  third  format:  a "list".  Because it's often hard to see
     what's columns past the first two, in list format each  "column"  is  on  a
     separate  line.   The  programs  dblistize and dbcolize convert to and from
     this format, and all programs work with either formats.  The command

         dbfilealter -R C  < DATA/passwd.fsdb

     outputs:

             #fsdb -R C account passwd uid gid fullname homedir shell
             account:  johnh
             passwd:   *
             uid:      2274
             gid:      134
             fullname: John_Heidemann
             homedir:  /home/johnh
             shell:    /bin/bash

             account:  greg
             passwd:   *
             uid:      2275
             gid:      134
             fullname: Greg_Johnson
             homedir:  /home/greg
             shell:    /bin/bash

             account:  root
             passwd:   *
             uid:      0
             gid:      0
             fullname: Root
             homedir:  /root
             shell:    /bin/bash

             # this is a simple database
             #  | dblistize

     See [1mdbfilealter[22m(1) for more details.

[1mBASIC DATA MANIPULATION[0m
     A number of programs exist to manipulate databases.  Complex functions  can
     be  made by stringing together commands with shell pipelines.  For example,
     to print the home directories of everyone with ``john'' in their names, you
     would do:

             cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir

     The output might be:

             #fsdb homedir
             /home/johnh
             /home/greg
             # this is a simple database
             #  | dbrow _fullname =~ /John/
             #  | dbcol homedir

     (Notice that comments are appended to the output listing each command, pro-
     viding an automatic audit log.)

     In addition to typical database functions (select, join,  etc.)  there  are
     also a number of statistical functions.

     The  real  power of Fsdb is that one can apply arbitrary code to rows to do
     powerful things.

             cat DATA/passwd | dbroweval '_fullname =~ s/(\w+)_(\w+)/$2,_$1/'

     converts "John_Heidemann" into "Heidemann,_John".  Not too much  more  work
     could split fullname into firstname and lastname fields.

     (Or:

             cat DATA/passwd | dbcolcreate sort | dbroweval -b 'use Fsdb::Support'
                     '_sort = _fullname; _sort =~ s/_/ /g; _sort = fullname_to_sort(_sort);'

[1mTALKING ABOUT COLUMNS[0m
     An  advantage  of Fsdb is that you can talk about columns by name (symboli-
     cally) rather than simply by their positions.  So  in  the  above  example,
     "dbcol homedir" pulled out the home directory column, and "dbrow '_fullname
     =~ /John/'" matched against column fullname.

     In  general,  you can use the name of the column listed on the "#fsdb" line
     to identify it in most programs, and _name to identify it in code.

     Some alternatives for flexibility:

     *   Numeric values identify columns positionally, numbering from 0.   So  0
         or _0 is the first column, 1 is the second, etc.

     *   In code, _last_columnname gets the value from columname's previous row.

     See [1mdbroweval[22m(1) for more details about writing code.

[1mLIST OF COMMANDS[0m
     Enough  said.   I'll  summarize  the commands, and then you can experiment.
     For a detailed description of each command, see a  summary  by  running  it
     with  the argument "--help" (or "-?" if you prefer.)  Full manual pages can
     be found by running the command with the argument "--man", or  running  the
     Unix command "man dbcol" or whatever program you want.

   [1mTABLE CREATION[0m
     dbcolcreate
         add columns to a database

     dbcoldefine
         set the column headings for a non-Fsdb file

   [1mTABLE MANIPULATION[0m
     dbcol
         select columns from a table

     dbrow
         select rows from a table

     dbsort
         sort rows based on a set of columns

     dbjoin
         compute the natural join of two tables

     dbcolrename
         rename a column

     dbcolmerge
         merge two columns into one

     dbcolsplittocols
         split one column into two or more columns

     dbcolsplittorows
         split one column into multiple rows

     dbfilepivot
         "pivots" a file, converting multiple rows corresponding to the same en-
         tity into a single row with multiple columns.

     dbfilevalidate
         check that db file doesn't have some common errors

   [1mCOMPUTATION AND STATISTICS[0m
     dbcolstats
         compute statistics over a column (mean,etc.,optionally median)

     dbmultistats
         group rows by some key value, then compute stats (mean, etc.) over each
         group (equivalent to dbmapreduce with dbcolstats as the reducer)

     dbmapreduce
         group  rows  (map)  and  then apply an arbitrary function to each group
         (reduce)

     dbrvstatdiff
         compare two samples distributions (mean/conf interval/T-test)

     dbcolmovingstats
         computing moving statistics over a column of data

     dbcolstatscores
         compute Z-scores and T-scores over one column of data

     dbcolpercentile
         compute the rank or percentile of a column

     dbcolhisto
         compute histograms over a column of data

     dbcolscorrelate
         compute the coefficient of correlation over several columns

     dbcolsdecimate
         drop rows selectively, keeping large changes and periodic samples

     dbcolsregression
         compute linear regression and correlation for two columns

     dbrowaccumulate
         compute a running sum over a column of data

     dbrowcount
         count the number of rows (a subset of dbstats)

     dbrowdiff
         compute differences between a columns in each row of a table

     dbrowenumerate
         number each row

     dbroweval
         run arbitrary Perl code on each row

     dbrowuniq
         count/eliminate identical rows (like Unix [1muniq[22m(1))

     dbfilediff
         compare fields on rows of a file (something like Unix [1mdiff[22m(1))

   [1mOUTPUT CONTROL[0m
     dbcolneaten
         pretty-print columns

     dbfilealter
         convert between column or list format, or change the column separator

     dbfilestripcomments
         remove comments from a table

     dbformmail
         generate a script that sends form mail based on each row

   [1mCONVERSIONS[0m
     (These programs convert data into fsdb.  See their web pages for details.)

     cgi_to_db
         <http://stein.cshl.org/boulder/>

     combined_log_format_to_db
         <http://httpd.apache.org/docs/2.0/logs.html>

     html_table_to_db
         HTML tables to fsdb (assuming they're reasonably formatted).

     kitrace_to_db
         <http://ficus-www.cs.ucla.edu/ficus-members/geoff/kitrace.html>

     ns_to_db
         <http://mash-www.cs.berkeley.edu/ns/>

     sqlselect_to_db
         the output of SQL SELECT tables to db

     tabdelim_to_db
         spreadsheet tab-delimited files to db

     tcpdump_to_db
         (see man [1mtcpdump[22m(8) on any reasonable system)

     xml_to_db
         XML input to fsdb, assuming they're very regular

     (And out of fsdb:)

     db_to_csv
         Comma-separated-value format from fsdb.

     db_to_html_table
         simple conversion of Fsdb to html tables

   [1mSTANDARD OPTIONS[0m
     Many programs have common options:

     [1m-? [22mor [1m--help[0m
         Show basic usage.

     [1m-N [22mon [1m--new-name[0m
         When a command creates a new  column  like  dbrowaccumulate's  "accum",
         this option lets one override the default name of that new column.

     [1m-T TmpDir[0m
         where  to  put tmp files.  Also uses environment variable TMPDIR, if -T
         is not specified.  Default is /tmp.

         Show basic usage.

     [1m-c FRACTION [22mor [1m--confidence FRACTION[0m
         Specify confidence interval FRACTION (dbcolstats, dbmultistats, etc.)

     [1m-C S [22mor "--element-separator S"
         Specify column separator S (dbcolsplittocols, dbcolmerge).

     [1m-d [22mor [1m--debug[0m
         Enable debugging (may be repeated for greater effect in some cases).

     [1m-a [22mor [1m--include-non-numeric[0m
         Compute stats over all data (treating non-numbers as zeros).   (By  de-
         fault,  things  that  can't be treated as numbers are ignored for stats
         purposes)

     [1m-S [22mor [1m--pre-sorted[0m
         Assume the data is pre-sorted.  May be repeated to disable verification
         (saving a small amount of work).

     [1m-e E [22mor [1m--empty E[0m
         give value E as the value for empty (null) records

     [1m-i I [22mor [1m--input I[0m
         Input data from file I.

     [1m-o O [22mor [1m--output O[0m
         Write data out to file O.

     [1m--header [22mH
         Use H as the full Fsdb header, rather than reading a header  from  then
         input.   This  option  is  particularly  useful  when  using Fsdb under
         Hadoop, where split files don't have heades.

     [1m--nolog[22m.
         Skip logging the program in a trailing comment.

     When giving Perl code (in dbrow and dbroweval) column names can be embedded
     if preceded by underscores.  Look at [1mdbrow[22m(1)  or  [1mdbroweval[22m(1)  for  exam-
     ples.)

     Most  programs run in constant memory and use temporary files if necessary.
     Exceptions are  dbcolneaten,  dbcolpercentile,  dbmapreduce,  dbmultistats,
     dbrowsplituniq.

   [1mSTANDARD SORTING OPTIONS[0m
     A number of programs do sorting, or depend on defining an ordering of rows.
     Such programs use these standard sorting options:

     [1m-r [22mor [1m--descending[0m
         sort in reverse order (high to low)

     [1m-R [22mor [1m--ascending[0m
         sort in normal order (low to high)

     [1m-t [22mor [1m--type-inferred-sorting[0m
         sort fields by type (numeric or leicographic), automatically

     [1m-n [22mor [1m--numeric[0m
         sort numerically

     [1m-N [22mor [1m--lexical[0m
         sort lexicographically

[1mANOTHER EXAMPLE[0m
     Take  the  raw data in "DATA/http_bandwidth", put a header on it ("dbcolde-
     fine size bw"), took statistics of each  category  ("dbmultistats  -k  size
     bw"),  pick out the relevant fields ("dbcol size mean stddev pct_rsd"), and
     you get:

             #fsdb size mean stddev pct_rsd
             1024    1.4962e+06      2.8497e+05      19.047
             10240   5.0286e+06      6.0103e+05      11.952
             102400  4.9216e+06      3.0939e+05      6.2863
             #  | dbcoldefine size bw
             #  | /home/johnh/BIN/DB/dbmultistats -k size bw
             #  | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd

     (The whole command was:

             cat DATA/http_bandwidth |
             dbcoldefine size |
             dbmultistats -k size bw |
             dbcol size mean stddev pct_rsd

     all on one line.)

     Then post-process them to get rid of the  exponential  notation  by  adding
     this to the end of the pipeline:

         dbroweval '_mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev);'

     (Actually, this step is no longer required since dbcolstats now uses a dif-
     ferent default format.)

     giving:

             #fsdb      size    mean    stddev  pct_rsd
             1024     1496200          284970        19.047
             10240    5028600          601030        11.952
             102400   4921600          309390        6.2863
             #  | dbcoldefine size bw
             #  | dbmultistats -k size bw
             #  | dbcol size mean stddev pct_rsd
             #  | dbroweval   { _mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev); }

     In a few lines, raw data is transformed to processed output.

     Suppose  you  expect  there  is an odd distribution of results of one data-
     point.  Fsdb can easily produce a CDF (cumulative distribution function) of
     the data, suitable for graphing:

         cat DB/DATA/http_bandwidth | \
             dbcoldefine size bw | \
             dbrow '_size == 102400' | \
             dbcol bw | \
             dbsort -n bw | \
             dbrowenumerate | \
             dbcolpercentile count | \
             dbcol bw percentile | \
             xgraph

     The steps, roughly: 1. get the raw input data and turn it into fsdb format,
     2. pick out just the relevant column (for efficiency) and sort it,  3.  for
     each data point, assign a CDF percentage to it, 4. pick out the two columns
     to graph and show them

[1mA GRADEBOOK EXAMPLE[0m
     The  first  commercial program I wrote was a gradebook, so here's how to do
     it with Fsdb.

     Format your data like DATA/grades.

             #fsdb name email id test1
             a a@ucla.example.edu 1 80
             b b@usc.example.edu 2 70
             c c@isi.example.edu 3 65
             d d@lmu.example.edu 4 90
             e e@caltech.example.edu 5 70
             f f@oxy.example.edu 6 90

     Or if your students have spaces in their names, use "-F S" and  two  spaces
     to separate each column:

             #fsdb -F S name email id test1
             alfred aho  a@ucla.example.edu  1  80
             butler lampson  b@usc.example.edu  2  70
             david clark  c@isi.example.edu  3  65
             constantine drovolis  d@lmu.example.edu  4  90
             debrorah estrin  e@caltech.example.edu  5  70
             sally floyd  f@oxy.example.edu  6  90

     To compute statistics on an exam, do

             cat DATA/grades | dbstats test1 |dblistize

     giving

             #fsdb -R C  ...
             mean:        77.5
             stddev:      10.84
             pct_rsd:     13.987
             conf_range:  11.377
             conf_low:    66.123
             conf_high:   88.877
             conf_pct:    0.95
             sum:         465
             sum_squared: 36625
             min:         65
             max:         90
             n:           6
             ...

     To do a histogram:

             cat DATA/grades | dbcolhisto -n 5 -g test1

     giving

             #fsdb low histogram
             65      *
             70      **
             75
             80      *
             85
             90      **
             #  | /home/johnh/BIN/DB/dbhistogram -n 5 -g test1

     Now  you  want  to  send  out  grades  to the students by e-mail.  Create a
     form-letter (in the file [4mtest1.txt[24m):

             To: _email (_name)
             From: J. Random Professor <jrp@usc.example.edu>
             Subject: test1 scores

             _name, your score on test1 was _test1.
             86+   A
             75-85 B
             70-74 C
             0-69  F

     Generate the shell script that will send the mail out:

             cat DATA/grades | dbformmail test1.txt > test1.sh

     And run it:

             sh <test1.sh

     The last two steps can be combined:

             cat DATA/grades | dbformmail test1.txt | sh

     but I like to keep a copy of exactly what I send.

     At the end of the semester you'll want to compute grade totals  and  assign
     letter  grades.   Both  fall  out  of  dbroweval.   For example, to compute
     weighted total grades with a 40% midterm/60% final where the midterm is  84
     possible points and the final 100:

             dbcol -rv total |
             dbcolcreate total - |
             dbroweval '
                     _total = .40 * _midterm/84.0 + .60 * _final/100.0;
                     _total = sprintf("%4.2f", _total);
                     if (_final eq "-" || ( _name =~ /^_/)) { _total = "-"; };' |
             dbcolneaten

     If  you  got the data originally from a spreadsheet, save it in "tab-delim-
     ited" format and convert it with tabdelim_to_db (run tabdelim_to_db -?  for
     examples).

[1mA PASSWORD EXAMPLE[0m
     To convert the Unix password file to db:

             cat /etc/passwd | sed 's/:/  /g'| \
                     dbcoldefine -F S login password uid gid gecos home shell \
                     >passwd.fsdb

     To convert the group file

             cat /etc/group | sed 's/:/  /g' | \
                     dbcoldefine -F S group password gid members \
                     >group.fsdb

     To  show the names of the groups that div7-members are in (assuming DIV7 is
     in the gecos field):

             cat passwd.fsdb | dbrow '_gecos =~ /DIV7/' | dbcol login gid | \
                     dbjoin -i - -i group.fsdb gid | dbcol login group

[1mSHORT EXAMPLES[0m
     Which Fsdb programs are the most  complicated  (based  on  number  of  test
     cases)?

             ls TEST/*.cmd | \
                     dbcoldefine test | \
                     dbroweval '_test =~ s@^TEST/([^_]+).*$@$1@' | \
                     dbrowuniq -c | \
                     dbsort -nr count | \
                     dbcolneaten

     (Answer: dbmapreduce, then dbcolstats, dbfilealter and dbjoin.)

     Stats on an exam (in $FILE, where $COLUMN is the name of the exam)?

             cat $FILE | dbcolstats -q 4 $COLUMN <$FILE | dblistize | dbstripcomments

             cat $FILE | dbcolhisto -g -n 20 $COLUMN | dbcolneaten | dbstripcomments

     Merging  a  the  hw1  column  from  file hw1.fsdb into grades.fsdb assuming
     there's a common student id in column "id":

             dbcol id hw1 <hw1.fsdb >t.fsdb

             dbjoin -a -e - grades.fsdb t.fsdb id | \
                 dbsort  name | \
                 dbcolneaten >new_grades.fsdb

     Merging two fsdb files with the same rows:

             cat file1.fsdb file2.fsdb >output.fsdb

     or if you want to clean things up a bit

             cat file1.fsdb file2.fsdb | dbstripextraheaders >output.fsdb

     or if you want to know where the data came from

             for i in 1 2
             do
                     dbcolcreate source $i < file$i.fsdb
             done >output.fsdb

     (assumes you're using a Bourne-shell compatible shell, not csh).

[1mWARNINGS[0m
     As with any tool, one should (which means [4mmust[24m) understand  the  limits  of
     the tool.

     All  Fsdb  tools should run in [4mconstant[24m [4mmemory[24m.  In some cases (such as [4mdb-[0m
     [4mcolstats[24m with quartiles, where the whole input must be  re-read),  programs
     will spool data to disk if necessary.

     Most tools buffer one or a few lines of data, so memory will scale with the
     size  of each line.  (So lines with many columns, or when columns have lots
     data, may cause large memory consumption.)

     All Fsdb tools should run in constant or at worst "n log n" time.

     All Fsdb tools use normal Perl math routines for computation.   Although  I
     make every attempt to choose numerically stable algorithms (although I also
     welcome  feedback  and suggestions for improvement), normal rounding due to
     computer floating point approximations can result in inaccuracies when data
     spans a large range of precision.  (See for example the  [4mdbcolstats_extrema[0m
     test cases.)

     Any  requirements  and  limitations  of each Fsdb tool is documented on its
     manual page.

     If any Fsdb program violates these assumptions, that is a bug  that  should
     be documented on the tool's manual page or ideally fixed.

     Fsdb does depend on Perl's correctness, and Perl (and Fsdb) have some bugs.
     Fsdb should work on perl from version 5.10 onward.

[1mHISTORY[0m
     There have been four major versions of Fsdb: fsdb-0.x was begun in 1991 for
     my personal use.  Fsdb 1.0 is a complete re-write of the pre-1995 versions,
     and  was distributed from 1995 to 2007.  Fsdb 2.0 is a significant re-write
     of the 1.x versions to systematically use a library and  threads  (although
     threads  were replaced with full processes in 2.44).  Fsdb 3.0 in 2022 adds
     type specifiers to the schema, mostly to  support  use  in  languages  with
     stronger typing (like Python, Go, and C).

     Fsdb  (in  its various forms) has been used extensively by its author since
     1991.  Since 1995 it's been used by two other researchers at UCLA and  sev-
     eral  at  ISI.   In  February 1998 it was announced to the Internet.  Since
     then it has found a few users, some outside where I work.

     Major changes:

     0.1 1991: begun for my personal use, to replace awk.
     1.0 1997-07-22: first public release.
     2.0 2008-01-25: rewrite to use a common library, and starting to use
     threads.
     2.12 2008-10-16: completion of the rewrite, and first RPM package.
     2.44 2013-10-02: replacing threads with processes for improved performance
     3.0 2022-04-04: adding type specifiers to the schema

   [1mFsdb 2.0 Rationale[0m
     I've thought about fsdb-2.0 for many years, but it was started  in  earnest
     in 2007.  Fsdb-2.0 has the following goals:

     in-one-process processing
         While fsdb is great on the Unix command line as a pipeline between pro-
         grams,  it  should  [4malso[24m  be  possible  to set it up to run in a single
         process.  And if it does so, it should be able to avoid serializing and
         deserializing (converting to and from text) data between  each  module.
         (Accomplished  in  fsdb-2.0:  see dbpipeline, although still needs tun-
         ing.)

     clean IO API
         Fsdb's roots go back to perl4 and 1991,  so  the  fsdb-1.x  library  is
         very,  very  crufty.   More than just being ugly (but it was that too),
         this made things reading from one format file and  writing  to  another
         the  application's job, when it should be the library's.  (Accomplished
         in fsdb-1.15 and improved in 2.0: see Fsdb::IO.)

     normalized module APIs
         Because fsdb modules were added as needed over 10 years, sometimes  the
         module  APIs  became inconsistent.  (For example, the 1.x "dbcolcreate"
         required an empty value following the name of the new column, but other
         programs specify empty values  with  the  "-e"  argument.)   We  should
         smooth  over  these  inconsistencies.  (Accomplished as each module was
         ported in 2.0 through 2.7.)

     everyone handles all input formats
         Given a clean IO API, the distinction between "colized" and  "listized"
         fsdb  files  should  go  away.   Any program should be able to read and
         write files in any format.  (Accomplished in fsdb-2.1.)

     Fsdb-2.0 preserves backwards compatibility where possible,  but  breaks  it
     where  necessary  to  accomplish the above goals.  In August 2008, Fsdb-2.7
     was declared preferred over the 1.x versions.  Benchmarking in 2013  showed
     that  threading  performed much worse than just using pipes, because Perl's
     requirements for data that is shared  between  multiple  threads  is  quite
     heavyweight.   Fsdb-2.44  therefore uses threading "style", but implemented
     with processes (via my "Freds" library).

   [1mFsdb And Muliple Processors[0m
     Fsdb's use of Unix pipelines means Fsdb automatically benefits  for  multi-
     processor  computers---each  pipeline stage can run on a separate core.  In
     addition, compute-intensive Fsdb modules like dbsort  and  dbmapreduce  are
     explicitly  multi-process and will use as many cores as they can, up to the
     number of cores on the local computer.

     Although Fsdb takes advanatage of as much parallelism as  it  can,  a  five
     stage  pipeline won't necessarily saturate five cores.  Pipeline stages al-
     most always have different amounts of work to do, and some stages are often
     data limited.  (Dbsort is attempts as much parallelism as it can,  and  can
     run 10-way parallel or more over a large enough input dataset.  But it can-
     not sustain high parallelism because of the requirement that it produce one
     global output.)

   [1mFsdb 3.0 Rationale[0m
     There are two motiviations for adding optional typing to Fsdb.  First, lan-
     guages  such  as  Python  and Go would really like type information.  As of
     2022 there are now users of those languages, so  the  basic  system  should
     support them.

     Second,  while  pure  text  is flexible, it's very inefficient---converting
     numbers to and from decimal is thousands of instructions, and binary encod-
     ings are often much smaller than text.  In the future, I would love to have
     a flag that enables a binary encoding.

     Typing is optional---omitting types is never wrong.

     One somewhat odd thing about typing is that we reuse the Perl pack  defini-
     tions of types, so q (for "quadword") stands for 64-bit integer.  These are
     perhaps  not  the  most  mnemonic  choices in 2022, but I would rather pick
     someone's existing set than try to define my own.

   [1mContributors[0m
     Fsdb includes code ported from Geoff  Kuenning  ("Fsdb::Support::TDistribu-
     tion").

     Fsdb   contributors:  Ashvin  Goel  [4mgoel@cse.oge.edu[24m,  Geoff  Kuenning  [4mge-[0m
     [4moff@fmg.cs.ucla.edu[24m, Vikram Visweswariah [4mvisweswa@isi.edu[24m, Kannan Varadahan
     [4mkannan@isi.edu[24m, Lars Eggert [4mlarse@isi.edu[24m, Arkadi Gelfond [4markadig@dyna.com[24m,
     David Graff [4mgraff@ldc.upenn.edu[24m, Haobo Yu  [4mhaoboy@packetdesign.com[24m,  Pavlin
     Radoslavov  [4mpavlin@catarina.usc.edu[24m, Graham Phillips, Yuri Pradkin, Alefiya
     Hussain, Ya Xu, Michael Schwendt, Fabio  Silva  [4mfabio@isi.edu[24m,  Jerry  Zhao
     [4mzhaoy@isi.edu[24m,     Ning     Xu     [4mnxu@aludra.usc.edu[24m,     Martin     Lukac
     [4mmlukac@lecs.cs.ucla.edu[24m, Xue Cai, Michael McQuaid, Christopher Meng, Calvin
     Ardi, H. Merijn Brand, Lan Wei, Hang Guo, Wes Hardaker, Erica Stutz.

     Fsdb includes datasets contributed from NIST ([4mDATA/nist_zarr13.fsdb[24m),  from
     <http://www.itl.nist.gov/div898/handbook/eda/section4/eda4281.htm>,     the
     NIST/SEMATECH e-Handbook of Statistical Methods, section  1.4.2.8.1.  Back-
     ground  and Data.  The source is public domain, and reproduced with permis-
     sion.

[1mRELATED WORK[0m
     As stated in the introduction, Fsdb is an incompatible reimplementation  of
     the  ideas  found in "/rdb".  By storing data in simple text files and pro-
     cessing it with pipelines it is easy to experiment (in the shell) and  look
     at  the  output.  The original implementation of this idea was /rdb, a com-
     mercial product described in the book [4mUNIX[24m [4mrelational[24m [4mdatabase[24m  [4mmanagement:[0m
     [4mapplication[24m  [4mdevelopment[24m  [4min[24m [4mthe[24m [4mUNIX[24m [4menvironment[24m by Rod Manis, Evan Schaf-
     fer, and Robert Jorgensen (and also at the web page <http://www.rdb.com/>).

     While Fsdb is inspired by Rdb, it includes no code from it, and Fsdb  makes
     several different design choices.  In particular: rdb attempts to be closer
     to  a "real" database, with provision for locking, file indexing.  Fsdb fo-
     cuses on single user use and so eschews these choices.  Rdb also  has  some
     support  for interactive editing.  Fsdb leaves editing to text editors like
     emacs or vi.

     In August, 2002 I found out Carlo Strozzi extended  RDB  with  his  package
     NoSQL  <http://www.linux.it/~carlos/nosql/>.   According to Mr. Strozzi, he
     implemented NoSQL in awk to avoid Perl start-up costs in RDB.   Although  I
     haven't  found  Perl  startup  overhead to be a big problem on my platforms
     (from old Sparcstation IPCs to 2GHz Pentium-4s), you may want  to  evaluate
     his   system.    The   Linux   Journal   has  a  description  of  NoSQL  at
     <http://www.linuxjournal.com/article/3294>.   It  seems  quite  similar  to
     Fsdb.   Like /rdb, NoSQL supports indexing (not present in Fsdb).  Fsdb ap-
     pears to have richer support for statistics, and, as of Fsdb-2.x, its  sup-
     port  for  Perl threading may support faster performance (one-process, less
     serialization and deserialization).

[1mRELEASE NOTES[0m
     Versions prior to 1.0 were released informally on my web page but were  not
     announced.

   [1m0.0 1991[0m
     started for my own research use

   [1m0.1 26-May-94[0m
     first check-in to RCS

   [1m0.2 15-Mar-95[0m
     parts now require perl5

   [1m1.0, 22-Jul-97[0m
     adds autoconf support and a test script.

   [1m1.1, 20-Jan-98[0m
     support for double space field separators, better tests

   [1m1.2, 11-Feb-98[0m
     minor changes and release on comp.lang.perl.announce

   [1m1.3, 17-Mar-98[0m
     *   adds median and quartile options to dbstats

     *   adds dmalloc_to_db converter

     *   fixes some warnings

     *   dbjoin now can run on unsorted input

     *   fixes a dbjoin bug

     *   some more tests in the test suite

   [1m1.4, 27-Mar-98[0m
     *   improves  error  messages (all should now report the program that makes
         the error)

     *   fixed a bug in dbstats output when the mean is zero

   [1m1.5, 25-Jun-98[0m
     BUG FIX dbcolhisto, dbcolpercentile now handles non-numeric values like db-
     stats
     NEW dbcolstats computes zscores and tscores over a column
     NEW dbcolscorrelate computes correlation coefficients between two columns
     INTERNAL ficus_getopt.pl has been replaced by DbGetopt.pm
     BUG FIX all tests are now ``portable'' (previously some tests ran only on
     my system)
     BUG FIX you no longer need to have the db programs in your path (fix arose
     from a discussion with Arkadi Gelfond)
     BUG FIX installation no longer uses cp -f (to work on SunOS 4)

   [1m1.6, 24-May-99[0m
     NEW dbsort, dbstats, dbmultistats now run in constant memory (using tmp
     files if necessary)
     NEW dbcolmovingstats does moving means over a series of data
     NEW dbcol has a -v option to get all columns except those listed
     NEW dbmultistats does quartiles and medians
     NEW dbstripextraheaders now also cleans up bogus comments before the fist
     header
     BUG FIX dbcolneaten works better with double-space-separated data

   [1m1.7,  5-Jan-00[0m
     NEW dbcolize now detects and rejects lines that contain embedded copies of
     the field separator
     NEW configure tries harder to prevent people from improperly configur-
     ing/installing fsdb
     NEW tcpdump_to_db converter (incomplete)
     NEW tabdelim_to_db converter:  from spreadsheet tab-delimited files to db
     NEW mailing lists for fsdb are     "fsdb-announce@heidemann.la.ca.us"
     and  "fsdb-talk@heidemann.la.ca.us"
         To subscribe to either, send  mail  to    "fsdb-announce-request@heide-
         mann.la.ca.us"   or   "fsdb-talk-request@heidemann.la.ca.us"       with
         "subscribe" in the BODY of the message.

     BUG FIX dbjoin used to produce incorrect output if there were extra, un-
     matched values in the 2nd table. Thanks to Graham Phillips for providing a
     test case.
     BUG FIX the sample commands in the usage strings now all should explicitly
     include the source of data (typically from "cat foo.fsdb |").  Thanks to Ya
     Xu for pointing out this documentation deficiency.
     BUG FIX (DOCUMENTATION) dbcolmovingstats had incorrect sample output.

   [1m1.8, 28-Jun-00[0m
     BUG FIX header options are now preserved when writing with dblistize
     NEW dbrowuniq now optionally checks for uniqueness only on certain fields
     NEW dbrowsplituniq makes one pass through a file and splits it into sepa-
     rate files based on the given fields
     NEW converter for "crl" format network traces
     NEW anywhere you use arbitrary code (like dbroweval), _last_foo now maps to
     the last row's value for field _foo.
     OPTIMIZATION comment processing slightly changed so that dbmultistats now
     is much faster on files with lots of comments (for example, ~100k lines of
     comments and 700 lines of data!) (Thanks to Graham Phillips for pointing
     out this performance problem.)
     BUG FIX dbstats with median/quartiles now correctly handles singleton data
     points.

   [1m1.9,  6-Nov-00[0m
     NEW dbfilesplit, split a single input file into multiple output files
     (based on code contributed by Pavlin Radoslavov).
     BUG FIX dbsort now works with perl-5.6

   [1m1.10, 10-Apr-01[0m
     BUG FIX dbstats now handles the case where there are more n-tiles than data
     NEW dbstats now includes a -S option to optimize work on pre-sorted data
     (inspired by code contributed by Haobo Yu)
     BUG FIX dbsort now has a better estimate of memory usage when run on data
     with very short records (problem detected by Haobo Yu)
     BUG FIX cleanup of temporary files is slightly better

   [1m1.11,  2-Nov-01[0m
     BUG FIX dbcolneaten now runs in constant memory
     NEW dbcolneaten now supports "field specifiers" that allow some control
     over how wide columns should be
     OPTIMIZATION dbsort now tries hard to be filesystem cache-friendly (in-
     spired by "Information and Control in Gray-box Systems" by the
     Arpaci-Dusseau's at SOSP 2001)
     INTERNAL t_distr now ported to perl5 module DbTDistr

   [1m1.12,  30-Oct-02[0m
     BUG FIX dbmultistats documentation typo fixed
     NEW dbcolmultiscale
     NEW dbcol has -r option for "relaxed error checking"
     NEW dbcolneaten has new -e option to strip end-of-line spaces
     NEW dbrow finally has a -v option to negate the test
     BUG FIX math bug in dbcoldiff fixed by Ashvin Goel (need to check Scheaffer
     test cases)
     BUG FIX some patches to run with Perl 5.8. Note: some programs (dbcolmulti-
     scale, dbmultistats, dbrowsplituniq) generate warnings like: "Use of unini-
     tialized value in concatenation (.)" or "string at
     /usr/lib/perl5/5.8.0/FileCache.pm line 98, <STDIN> line 2". Please ignore
     this until I figure out how to suppress it. (Thanks to Jerry Zhao for
     noticing perl-5.8 problems.)
     BUG FIX fixed an autoconf problem where configure would fail to find a rea-
     sonable prefix (thanks to Fabio Silva for reporting the problem)
     NEW db_to_html_table: simple conversion to html tables (NO fancy stuff)
     NEW dblib now has a function [1mdblib_text2html() [22mthat will do simple conver-
     sion of iso-8859-1 to HTML

   [1m1.13,  4-Feb-04[0m
     NEW fsdb added to the freebsd ports tree <http://www.freshports.org/data-
     bases/fsdb/>.  Maintainer: "larse@isi.edu"
     BUG FIX properly handle trailing spaces when data must be numeric (ex. db-
     stats with -FS, see test dbstats_trailing_spaces). Fix from Ning Xu
     "nxu@aludra.usc.edu".
     NEW dbcolize error message improved (bug report from Terrence Brannon), and
     list format documented in the README.
     NEW cgi_to_db converts CGI.pm-format storage to fsdb list format
     BUG FIX handle numeric synonyms for column names in dbcol properly
     ENHANCEMENT "talking about columns" section added to README. Lack of docu-
     mentation pointed out by Lars Eggert.
     CHANGE dbformmail now defaults to using Mail ("Berkeley Mail") to send
     mail, rather than sendmail (sendmail is still an option, but mail doesn't
     require running as root)
     NEW on platforms that support it (i.e., with perl 5.8), fsdb works fine
     with unicode
     NEW dbfilevalidate: check a db file for some common errors

   [1m1.14,  24-Aug-06[0m
     ENHANCEMENT README cleanup
     INCOMPATIBLE CHANGE dbcolsplit renamed dbcolsplittocols
     NEW dbcolsplittorows  split one column into multiple rows
     NEW dbcolsregression compute linear regression and correlation for two
     columns
     ENHANCEMENT cvs_to_db: better error handling, normalize field names, skip
     blank lines
     ENHANCEMENT dbjoin now detects (and fails) if non-joined files have dupli-
     cate names
     BUG FIX minor bug fixed in calculation of Student t-distributions (doesn't
     change any test output, but may have caused small errors)

   [1m1.15, 12-Nov-07[0m
     NEW fsdb-1.14 added to the MacOS Fink system <http://pdb.finkpro-
     ject.org/pdb/package.php/fsdb>. (Thanks to Lars Eggert for maintaining this
     port.)
     NEW Fsdb::IO::Reader and Fsdb::IO::Writer now provide reasonably clean OO
     I/O interfaces to Fsdb files.  Highly recommended if you use fsdb directly
     from perl.  In the fullness of time I expect to reimplement the entire
     thing using these APIs to replace the current dblib.pl which is still hob-
     bled by its roots in perl4.
     NEW dbmapreduce now implements a Google-style map/reduce abstraction, gen-
     eralizing dbmultistats.
     ENHANCEMENT fsdb now uses the Perl build system (Makefile.PL, etc.), in-
     stead of autoconf.  This change paves the way to better perl-5-style modu-
     larization, proper manual pages, input of both listize and colize format
     for every program, and world peace.
     ENHANCEMENT dblib.pl is now moved to Fsdb::Old.pm.
     BUG FIX dbmultistats now propagates its format argument (-f). Bug and fix
     from Martin Lukac (thanks!).
     ENHANCEMENT dbformmail documentation now is clearer that it doesn't send
     the mail, you have to run the shell script it writes.  (Problem observed by
     Unkyu Park.)
     ENHANCEMENT adapted to autoconf-2.61 (and then these changes were discarded
     in favor of The Perl Way.
     BUG FIX dbmultistats memory usage corrected (O(# tags), not O(1))
     ENHANCEMENT dbmultistats can now optionally run with pre-grouped input in
     O(1) memory
     ENHANCEMENT dbroweval -N was finally implemented (eat comments)

   [1m2.0, 25-Jan-08[0m
     2.0, 25-Jan-08 --- a quiet 2.0 release (gearing up towards complete)

     ENHANCEMENT: shifting old programs to Perl modules, with the front-end pro-
     gram as just a wrapper. In the short-term, this change just means programs
     have real man pages. In the long-run, it will mean that one can run a
     pipeline in a single Perl program. So far: dbcol, dbroweval, the new dbrow-
     count. dbsort the new dbmerge, the old "dbstats" (renamed dbcolstats), db-
     colrename, dbcolcreate,
     NEW: Fsdb::Filter::dbpipeline is an internal-only module that lets one use
     fsdb commands from within perl (via threads).
         It also provides perl function aliases for the internal modules,  so  a
         string of fsdb commands in perl are nearly as terse as in the shell:

             use Fsdb::Filter::dbpipeline qw(:all);
             dbpipeline(
                 dbrow(qw(name test1)),
                 dbroweval('_test1 += 5;')
             );

     INCOMPATIBLE CHANGE: The old dbcolstats has been renamed dbcolstatscores.
     The new dbcolstats does the same thing as the old dbstats. This incompati-
     bility is unfortunate but normalizes program names.
     CHANGE: The new dbcolstats program always outputs "-" (the default empty
     value) for statistics it cannot compute (for example, standard deviation if
     there is only one row), instead of the old mix of "-" and "na".
     INCOMPATIBLE CHANGE: The old dbcolstats program, now called dbcol-
     statscores, also has different arguments.  The "-t mean,stddev" option is
     now "--tmean mean --tstddev stddev".  See dbcolstatscores for details.
     INCOMPATIBLE CHANGE: dbcolcreate now assumes all new columns get the de-
     fault value rather than requiring each column to have an initial constant
     value. To change the initial value, sue the new "-e" option.
     NEW: dbrowcount counts rows, an almost-subset of dbcolstats's "n" output
     (except without differentiating numeric/non-numeric input), or the equiva-
     lent of "dbstripcomments | wc -l".
     NEW: dbmerge merges two sorted files. This functionality was previously em-
     bedded in dbsort.
     INCOMPATIBLE CHANGE: dbjoin's "-i" option to include non-matches is now re-
     named "-a", so as to not conflict with the new standard option "-i" for in-
     put file.

   [1m2.1,  6-Apr-08[0m
     2.1,  6-Apr-08 --- another alpha 2.0, but now all converted programs under-
     stand both listize and colize format

     ENHANCEMENT: shifting more old programs to Perl modules. New in 2.1: dbcol-
     neaten, dbcoldefine, dbcolhisto, dblistize, dbcolize, dbrecolize
     ENHANCEMENT dbmerge now handles an arbitrary number of input files, not
     just exactly two.
     NEW dbmerge2 is an internal routine that handles merging exactly two files.
     INCOMPATIBLE CHANGE dbjoin now specifies inputs like dbmerge2, rather than
     assuming the first two arguments were tables (as in fsdb-1).
         The old dbjoin argument "-i" is now "-a" or <--type=outer>.

         A  minor change: comments in the source files for dbjoin are now inter-
         mixed with output rather than being delayed until the end.

     ENHANCEMENT dbsort now no longer produces warnings when null values are
     passed to numeric comparisons.
     BUG FIX dbroweval now once again works with code that lacks a trailing
     semicolon. (This bug fixes a regression from 1.15.)
     INCOMPATIBLE CHANGE dbcolneaten's old "-e" option (to avoid end-of-line
     spaces) is now "-E" to avoid conflicts with the standard empty field argu-
     ment.
     INCOMPATIBLE CHANGE dbcolhisto's old "-e" option is now "-E" to avoid con-
     flicts. And its "-n", "-s", and "-w" are now "-N", "-S", and "-W" to corre-
     spond.
     NEW dbfilealter replaces dbrecolize, dblistize, and dbcolize, but with dif-
     ferent options.
     ENHANCEMENT The library routines "Fsdb::IO" now understand both list-format
     and column-format data, so all converted programs can now [4mautomatically[0m
     read either format.  This capability was one of the milestone goals for
     2.0, so yea!

   [1m2.2, 23-May-08[0m
     Release 2.2 is another 2.x alpha release.  Now [4mmost[24m  of  the  commands  are
     ported,  but  a few remain, and I plan one last incompatible change (to the
     file header) before 2.x final.

     ENHANCEMENT
         shifting more old programs to Perl modules.  New in  2.2:  dbrowaccumu-
         late,   dbformmail.   dbcolmovingstats.   dbrowuniq.   dbrowdiff.   db-
         colmerge.  dbcolsplittocols.  dbcolsplittorows.  dbmapreduce.  dbmulti-
         stats.  dbrvstatdiff.  Also dbrowenumerate exists only as  a  front-end
         (command-line) program.

     INCOMPATIBLE CHANGE
         The  following  programs have been dropped from fsdb-2.x: dbcoltighten,
         dbfilesplit, dbstripextraheaders, dbstripleadingspace.

     NEW combined_log_format_to_db to convert Apache logfiles

     INCOMPATIBLE CHANGE
         Options to dbrowdiff are now [1m-B [22mand [1m-I[22m, not [1m-a [22mand [1m-i[22m.

     INCOMPATIBLE CHANGE
         dbstripcomments is now dbfilestripcomments.

     BUG FIXES
         dbcolneaten better handles empty columns; dbcolhisto warning suppressed
         (actually a bug in high-bucket handling).

     INCOMPATIBLE CHANGE
         dbmultistats now requires a "-k" option  in  front  of  the  key  (tag)
         field, or if none is given, it will group by the first field (both like
         dbmapreduce).

     KNOWN BUG
         dbmultistats with quantile option doesn't work currently.

     INCOMPATIBLE CHANGE
         dbcoldiff is renamed dbrvstatdiff.

     BUG FIXES
         dbformmail  was  leaving  its log message as a  command, not a comment.
         Oops.  No longer.

   [1m2.3, 27-May-08 (alpha)[0m
     Another alpha release, this one just to fix the critical dbjoin bug  listed
     below (that happens to have blocked my MP3 jukebox :-).

     BUG FIX
         Dbsort no longer hangs if given an input file with no rows.

     BUG FIX
         Dbjoin  now  works  with  unsorted  input  coming from a pipeline (like
         stdin).   Perl-5.8.8  has  a  bug  (?)  that  was  making   this   case
         fail---opening  stdin in one thread, reading some, then reading more in
         a different thread caused an lseek which works on files, but  fails  on
         pipes like stdin.  Go figure.

     BUG FIX / KNOWN BUG
         The  dbjoin  fix also fixed dbmultistats -q (it now gives the right an-
         swer).  Although a new bug appeared, messages like:
             Attempt to free unreferenced  scalar:  SV  0xa9dd0c4,  Perl  inter-
         preter: 0xa8350b8 during global destruction.  So the dbmultistats_quar-
         tile test is still disabled.

   [1m2.4, 18-Jun-08[0m
     Another  alpha  release, mostly to fix minor usability problems in dbmapre-
     duce and client functions.

     ENHANCEMENT
         dbrow now defaults to running user supplied code without  warnings  (as
         with fsdb-1.x).  Use "--warnings" or "-w" to turn them back on.

     ENHANCEMENT
         dbroweval  can  now write different format output than the input, using
         the "-m" option.

     KNOWN BUG
         dbmapreduce emits warnings on perl 5.10.0 about "Unbalanced string  ta-
         ble refcount" and "Scalars leaked" when run with an external program as
         a reducer.

         dbmultistats  emits  the  warning "Attempt to free unreferenced scalar"
         when run with quartiles.

         In each case the output is correct.  I believe these can be ignored.

     CHANGE
         dbmapreduce no longer logs a line for each reducer that is invoked.

   [1m2.5, 24-Jun-08[0m
     Another alpha release, fixing more minor bugs in "dbmapreduce" and  lossage
     in "Fsdb::IO".

     ENHANCEMENT
         dbmapreduce  can now tolerate non-map-aware reducers that pass back the
         key column in put.  It also passes the current key as the last argument
         to external reducers.

     BUG FIX
         Fsdb::IO::Reader, correctly handle  "-header"  option  again.   (Broken
         since fsdb-2.3.)

   [1m2.6, 11-Jul-08[0m
     Another  alpha  release,  needed  to  fix DaGronk.  One new port, small bug
     fixes, and important fix to dbmapreduce.

     ENHANCEMENT
         shifting more old programs to Perl  modules.   New  in  2.2:  dbcolper-
         centile.

     INCOMPATIBLE CHANGE and ENHANCEMENTS dbcolpercentile arguments changed, use
     "--rank" to require ranking instead of "-r". Also, "--ascending" and "--de-
     scending" can now be specified separately, both for "--percentile" and
     "--rank".
     BUG FIX
         Sigh,  the  sense  of  the --warnings option in dbrow was inverted.  No
         longer.

     BUG FIX
         I found and fixed the string leaks (errors like "Unbalanced string  ta-
         ble  refcount"  and  "Scalars leaked") in dbmapreduce and dbmultistats.
         (All "IO::Handle"s in threads must be manually destroyed.)

     BUG FIX
         The "-C" option to specify the column separator in dbcolsplittorows now
         works again (broken since it was ported).

     2.7, 30-Jul-08 beta

     The beta release of fsdb-2.x.  Finally, all programs are ported.   As  sta-
     tistics,  the  number  of  lines  of  non-library code doubled from 7.5k to
     15.5k.  The libraries are much more complete, going from 866 to 5164 lines.
     The overall number of programs is about the same, although 19 were  dropped
     and  11  were  added.   The number of test cases has grown from 116 to 175.
     All programs are now in perl-5, no more shell scripts or perl-4.  All  pro-
     grams now have manual pages.

     Although  this  is  a major step forward, I still expect to rename "jdb" to
     "fsdb".

     ENHANCEMENT
         shifting more old programs to Perl modules.  New in  2.7:  dbcolscorel-
         late.    dbcolsregression.    cgi_to_db.   dbfilevalidate.   db_to_csv.
         csv_to_db,  db_to_html_table,  kitrace_to_db,   tcpdump_to_db,   tabde-
         lim_to_db, ns_to_db.

     INCOMPATIBLE CHANGE
         The  following programs have been dropped from fsdb-2.x: db2dcliff, db-
         colmultiscale, crl_to_db.  ipchain_logs_to_db.  They may come back, but
         seemed overly specialized.  The following  program  dbrowsplituniq  was
         dropped  because  it  is  superseded by dbmapreduce.  dmalloc_to_db was
         dropped pending a test cases and examples.

     ENHANCEMENT
         dbfilevalidate now has a "-c" option to correct errors.

     NEW html_table_to_db provides the inverse of db_to_html_table.

   [1m2.8,  5-Aug-08[0m
     Change header format, preserving forwards compatibility.

     BUG FIX
         Complete editing pass over the  manual,  making  sure  it  aligns  with
         fsdb-2.x.

     SEMI-COMPATIBLE CHANGE
         The  header  of fsdb files has changed, it is now #fsdb, not #h (or #L)
         and parsing of -F and -R are also different.  See dbfilealter  for  the
         new  specification.   The  v1 file format will be read, compatibly, but
         not written.

     BUG FIX
         dbmapreduce now tolerates comments that precede the first key,  instead
         of failing with an error message.

   [1m2.9, 6-Aug-08[0m
     Still in beta; just a quick bug-fix for dbmapreduce.

     ENHANCEMENT
         dbmapreduce now generates plausible output when given no rows of input.

   [1m2.10, 23-Sep-08[0m
     Still in beta, but picking up some bug fixes.

     ENHANCEMENT
         dbmapreduce now generates plausible output when given no rows of input.

     ENHANCEMENT
         dbroweval  the  warnings option was backwards; now corrected.  As a re-
         sult, warnings in user code now default off (like in fsdb-1.x).

     BUG FIX
         dbcolpercentile now defaults to assuming the target column is  numeric.
         The new option "-N" allows selection of a non-numeric target.

     BUG FIX
         dbcolscorrelate  now  includes  "--sample"  and "--nosample" options to
         compute the sample or full population correlation coefficients.  Thanks
         to Xue Cai for finding this bug.

   [1m2.11, 14-Oct-08[0m
     Still in beta, but picking up some bug fixes.

     ENHANCEMENT
         html_table_to_db is now more aggressive about filling  in  empty  cells
         with  the  official  empty  value, rather than leaving them blank or as
         whitespace.

     ENHANCEMENT
         dbpipeline now catches failures during pipeline element setup and exits
         reasonably gracefully.

     BUG FIX
         dbsubprocess now reaps child processes, thus avoiding  running  out  of
         processes when used a lot.

   [1m2.12, 16-Oct-08[0m
     Finally, a full (non-beta) 2.x release!

     INCOMPATIBLE CHANGE
         Jdb  has  been  renamed  Fsdb,  the  flatfile-streaming database.  This
         change affects all internal Perl APIs, but no shell command-level APIs.
         While Jdb served well for more than ten years, it  is  easily  confused
         with  the Java debugger (even though Jdb was there first!).  It also is
         too generic to work well in web search engines.   Finally,  Jdb  stands
         for  ``John's  database'', and we're a bit beyond that.  (However, some
         call me the ``file-system guy'', so one could  argue  it  retains  that
         meeting.)

         If you just used the shell commands, this change should not affect you.
         If  you used the Perl-level libraries directly in your code, you should
         be able to rename "Jdb" to "Fsdb" to move to 2.12.

         The jdb-announce list not yet been renamed, but it will be shortly.

         With this release I've accomplished everything I wanted to in fsdb-2.x.
         I therefore expect to return to boring, bugfix releases.

   [1m2.13, 30-Oct-08[0m
     BUG FIX
         dbrowaccumulate now treats non-numeric data as zero by default.

     BUG FIX
         Fixed a perl-5.10ism in dbmapreduce that breaks that program under 5.8.
         Thanks to Martin Lukac for reporting the bug.

   [1m2.14, 26-Nov-08[0m
     BUG FIX
         Improved documentation for dbmapreduce's "-f" option.

     ENHANCEMENT
         dbcolmovingstats how computes a moving standard deviation  in  addition
         to a moving mean.

   [1m2.15, 13-Apr-09[0m
     BUG FIX
         Fix a [4mmake[24m [4minstall[24m bug reported by Shalindra Fernando.

   [1m2.16, 14-Apr-09[0m
     BUG FIX
         Another minor release bug: on some systems [4mprogramize_module[24m looses ex-
         ecutable permissions.  Again reported by Shalindra Fernando.

   [1m2.17, 25-Jun-09[0m
     TYPO FIXES
         Typo in the [4mdbroweval[24m manual fixed.

     IMPROVEMENT
         There  is no longer a comment line to label columns in [4mdbcolneaten[24m, in-
         stead the header line is tweaked to line up.  This change restores  the
         Jdb-1.x behavior, and means that repeated runs of dbcolneaten no longer
         add comment lines each time.

     BUG FIX
         It  turns  out   [4mdbcolneaten[24m was not correctly handling trailing spaces
         when given the "-E" option to suppress them.  This  regression  is  now
         fixed.

     EXTENSION
         [1mdbroweval[22m(1)  can  now  handle  direct  references  to the last row via
         [4m$lfref[24m, a dubious but now documented feature.

     BUG FIXES
         Separators set with "-C" in [4mdbcolmerge[24m and  [4mdbcolsplittocols[24m  were  not
         properly setting the heading, and null fields were not recognized.  The
         first bug was reported by Martin Lukac.

   [1m2.18,  1-Jul-09  A minor release[0m
     IMPROVEMENT
         Documentation for [4mFsdb::IO::Reader[24m has been improved.

     IMPROVEMENT
         The package should now be PGP-signed.

   [1m2.19,  10-Jul-09[0m
     BUG FIX
         Internal improvements to debugging output and robustness of [4mdbmapreduce[0m
         and [4mdbpipeline[24m.  [4mTEST/dbpipeline_first_fails.cmd[24m re-enabled.

   [1m2.20,  30-Nov-09 (A collection of minor bugfixes, plus a build against Fedora[0m
     [1m12.)[0m
     BUG FIX
         Logging for [4mdbmapreduce[24m with code refs is now stable (it no longer  in-
         cludes a hex pointer to the code reference).

     BUG FIX
         Better handling of mixed blank lines in [4mFsdb::IO::Reader[24m (see test case
         [4mdbcolize_blank_lines.cmd[24m).

     BUG FIX
         [4mhtml_table_to_db[24m  now  handles multi-line input better, and handles ta-
         bles with COLSPAN.

     BUG FIX
         [4mdbpipeline[24m now cleans up threads in an "eval" to prevent "cannot detach
         a joined thread" errors that popped up in  perl-5.10.   Hopefully  this
         prevents a race condition that causes the test suites to hang about 20%
         of the time (in [4mdbpipeline_first_fails[24m).

     IMPROVEMENT
         [4mdbmapreduce[24m  now detects and correctly fails when the input and reducer
         have incompatible field separators.

     IMPROVEMENT
         [4mdbcolstats[24m, [4mdbcolhisto[24m, [4mdbcolscorrelate[24m, [4mdbcolsregression[24m,  and  [4mdbrow-[0m
         [4mcount[24m  now  all take an "-F" option to let one specify the output field
         separator (so they work better with [4mdbmapreduce[24m).

     BUG FIX
         An omitted "-k" from the manual page of [4mdbmultistats[24m is now there.  Bug
         reported by Unkyu Park.

   [1m2.21, 17-Apr-10 bug fix release[0m
     BUG FIX
         [4mFsdb::IO::Writer[24m now no longer fails with -outputheader  =>  never  (an
         obscure bug).

     IMPROVEMENT
         [4mFsdb[24m  (in the warnings section) and [4mdbcolstats[24m now more carefully docu-
         ment how they handle (and do not handle) numerical precision  problems,
         and  other  general  limits.  Thanks to Yuri Pradkin for prompting this
         documentation.

     IMPROVEMENT
         "Fsdb::Support::fullname_to_sortkey" is now restored from "Jdb".

     IMPROVEMENT
         Documention for multiple styles of input approaches (including  perfor-
         mance description) added to Fsdb::IO.

   [1m2.22,  2010-10-31  One  new tool [4mdbcolcopylast[24m and several bug fixes for Perl[0m
     [1m5.10.[0m
     BUG FIX
         [4mdbmerge[24m now correctly handles n-way merges.  Bug reported by Yuri Prad-
         kin.

     INCOMPARABLE CHANGE
         [4mdbcolneaten[24m now defaults to [4mnot[24m padding the last column.

     ADDITION
         [4mdbrowenumerate[24m now takes [1m-N NewColumn [22mto give the  new  column  a  name
         other than "count".  Feature requested by Mike Rouch in January 2005.

     ADDITION
         New  program [4mdbcolcopylast[24m copies the last value of a column into a new
         column copylast_column of the next row.  New program requested by Fabio
         Silva; useful for converting dbmultistats output into dbrvstatdiff  in-
         put.

     BUG FIX
         Several  tools (particularly [4mdbmapreduce[24m and [4mdbmultistats[24m) would report
         errors like "Unbalanced string table refcount: (1) for "STDOUT"  during
         global  destruction" on exit, at least on certain versions of Perl (for
         me on 5.10.1), but similar errors have been off-and-on for several Perl
         releases.  Although I think my code looked OK,  I  worked  around  this
         problem with a different way of handling standard IO redirection.

   [1m2.23,  2011-03-10 Several small portability bugfixes; improved [4mdbcolstats[24m for[0m
     [1mlarge datasets[0m
     IMPROVEMENT
         Documentation to [4mdbrvstatdiff[24m was changed to use "sd" to refer to stan-
         dard deviation, not "ss" (which might be confused with sum-of-squares).

     BUG FIX
         This documentation about [4mdbmultistats[24m was missing the [4m-k[24m option in some
         cases.

     BUG FIX
         [4mdbmapreduce[24m was failing on MacOS-10.6.3 for some tests with the error

             dbmapreduce: cannot run external dbmapreduce reduce program (perl TEST/dbmapreduce_external_with_key.pl)

         The problem seemed to be only in the error, not in operation.   On  Ma-
         cOS,  the  error is now suppressed.  Thanks to Alefiya Hussain for pro-
         viding access to a Mac system that allowed debugging of this problem.

     IMPROVEMENT
         The [4mcsv_to_db[24m command requires an external Perl library ([4mText::CSV_XS[24m).
         On computers that lack this optional  library,  previously  Fsdb  would
         configure  with  a  warning  and then test cases would fail.  Now those
         test cases are skipped with an additional warning.

     BUG FIX
         The test suite now supports alternative valid output, as a hack to  ac-
         count  for last-digit floating point differences.  (Not very satisfying
         :-(

     BUG FIX
         [4mdbcolstats[24m output for confidence intervals on very large  datasets  has
         changed.   Previously  it failed for more than 2^31-1 records, and han-
         dling of T-Distributions with thousands of rows was a bit dubious.  Now
         datasets with more than 10000 are considered infinitely large and hope-
         fully correctly handled.

   [1m2.24, 2011-04-15 Improvements to fix an old bug in dbmapreduce with different[0m
     [1mfield separators[0m
     IMPROVEMENT
         The [4mdbfilealter[24m command had a "--correct" option  to  work-around  from
         incompatible  field-separators,  but  it  did nothing.  Now it does the
         correct but sad, data-loosing thing.

     IMPROVEMENT
         The [4mdbmultistats[24m command previously failed with an error  message  when
         invoked  on  input  with a non-default field separator.  The root cause
         was the underlying [4mdbmapreduce[24m that did not handle the case of reducers
         that generated output with a different field separator than the  input.
         We  now  detect  and repair incompatible field separators.  This change
         corrects a problem originally documented  and  detected  in  Fsdb-2.20.
         Bug re-reported by Unkyu Park.

   [1m2.25,  2011-08-07  Two new tools, [4mxml_to_db[24m and [4mdbfilepivot[24m, and a bugfix for[0m
     [1mtwo people.[0m
     IMPROVEMENT
         [4mkitrace_to_db[24m now supports a [4m--utc[24m option, which also fixes  this  test
         case for users outside of the Pacific time zone.  Bug reported by David
         Graff, and also by Peter Desnoyers (within a week of each other :-)

     NEW [4mxml_to_db[24m can convert simple, very regular XML files into Fsdb.

     NEW [4mdbfilepivot[24m  "pivots" a file, converting multiple rows corresponding to
         the same entity into a single row with multiple columns.

   [1m2.26, 2011-12-12 Bug fixes, particularly for perl-5.14.2.[0m
     BUG FIX
         Bugs fixed in [1mFsdb::IO::Reader[22m(3) manual page.

     BUG FIX
         Fixed problems where dbcolstats was truncating floating  point  numbers
         when  sorting.   This strange behavior happens as of perl-5.14.2 and it
         [4mseems[24m like a Perl bug.  I've worked around it for the test suites,  but
         I'm a bit nervous.

   [1m2.27, 2012-11-15 Accumulated bug fixes.[0m
     IMPROVEMENT
         [4mcsv_to_db[24m now reports errors in CVS input with real diagnostics.

     IMPROVEMENT
         [4mdbcolmovingstats[24m can now compute median, when given the "-m" option.

     BUG FIX
         [4mdbcolmovingstats[24m non-numeric handling (the "-a" option) now works prop-
         erly.

     DOCUMENTATION
         The internal [4mt/test_command.t[24m test framework is now documented.

     BUG FIX
         [4mdbrowuniq[24m  now correctly handles the case where there is no input (pre-
         viously it output a blank  line,  which  is  a  malformed  fsdb  file).
         Thanks to Yuri Pradkin for reporting this bug.

   [1m2.28, 2012-11-15 A quick release to fix most rpmlint errors.[0m
     BUG FIX
         Fixed  a  number  of minor release problems (wrong permissions, old FSF
         address, etc.) found by rpmlint.

   [1m2.29, 2012-11-20 a quick release for CPAN testing[0m
     IMPROVEMENT
         Tweaked the RPM spec.

     IMPROVEMENT
         Modified [4mMakefile.PL[24m to fail gracefully on Perl installations that lack
         threads.  (Without this fix, I get massive failures in the non-ithreads
         test system.)

   [1m2.30, 2012-11-25 improvements to perl portability[0m
     BUG FIX
         Removed unicode character in documention  of  [4mdbcolscorrelated[24m  so  pod
         tests will pass.  (Sigh, that should work :-( )

     BUG FIX
         Fixed  test  suite failures on 5 tests ([4mdbcolcreate_double_creation[24m was
         the first) due to Carp's addition of a period.  This problem was break-
         ing Fsdb on perl-5.17.  Thanks to Michael McQuaid for helping  diagnose
         this problem.

     IMPROVEMENT
         The test suite now prints out the names of tests it tries.

   [1m2.31,  2012-11-28  A  release  with  actual  improvements  to dbfilepivot and[0m
     [1mdbrowuniq.[0m
     BUG FIX
         Documentation fixes: typos in dbcolscorrelated,  bugs  in  dbfilepivot,
         clarification for comment handling in Fsdb::IO::Reader.

     IMPROVEMENT
         Previously dbfilepivot assumed the input was grouped by keys and didn't
         very  that  pre-condition.  Now there is no pre-condition (it will sort
         the input by default), and it checks if the invariant is violated.

     BUG FIX
         Previously dbfilepivot failed if the input had comments (oops  :-);  no
         longer.

     IMPROVEMENT
         Now  dbrowuniq has the "-L" option to preserve the last unique row (in-
         stead of the first), a common idiom.

   [1m2.32, 2012-12-21 Test suites should now be more numerically robust.[0m
     NEW New dbfilediff does fsdb-aware file differencing.  It does not do smart
         intuition of add/removes like Unix [1mdiff[22m(1),  but  it  does  know  about
         columns, and with "-E", it does numeric-aware differences.

     IMPROVEMENT
         Test  suites  that  are  numeric now use dbfilediff to do numeric-aware
         comparisons, so the test suite should now be robust to slightly differ-
         ent computers and operating systems and compilers than [4mexactly[24m  what  I
         use.

   [1m2.33, 2012-12-23 Minor fixes to some test cases.[0m
     IMPROVEMENT
         dbfilediff  and  dbrowuniq now supports the "-N" option to give the new
         column a different name.  (And a test cases where this duplication mat-
         tered have been fixed.)

     IMPROVEMENT
         dbrvstatdiff now show the t-test breakpoint with a reasonable number of
         floating point digits.

     BUG FIX
         Fixed a numerical stability problem in the [4mdbroweval_last[24m test case.

   [1m2.34, 2013-02-10 Parallelism in dbmerge.[0m
     IMPROVEMENT
         Documention for dbjoin now includes resource requirements.

     IMPROVEMENT
         Default memory usage for dbsort is now about 256MB.  (The  world  keeps
         moving forward.)

     IMPROVEMENT
         dbmerge  now does merging in parallel.  As a side-effect, dbsort should
         be faster when input overflows memory.  The level of parallelism can be
         limited with the "--parallelism" option.  (There is  more  work  to  do
         here, but we're off to a start.)

   [1m2.35, 2013-02-23 Improvements to dbmerge parallelism[0m
     BUG FIX
         Fsdb temporary files are now created more securely (with File::Temp).

     IMPROVEMENT
         Programs  that  sort or merge on fields (dbmerge2, dbmerge, dbsort, db-
         join) now report an error if no fields on which to join  or  merge  are
         given.

     IMPROVEMENT
         Parallelism  in  dbmerge  is  should  now be more consistent, with less
         starting and stopping.

     IMPROVEMENT In dbmerge, the "--xargs" option lets one give input filenames
     on standard input, rather than the command line. This feature paves the way
     for faster dbsort for large inputs (by pipelining sorting and merging), ex-
     pected in the next release.

   [1m2.36, 2013-02-25 dbsort pipelines with dbmerge[0m
     IMPROVEMENT For large inputs, dbsort now pipelines sorting and merging, al-
     lowing earlier processing.
     BUG FIX Since 2.35, dbmerge delayed cleanup of intermediate files, thereby
     requiring extra disk space.

   [1m2.37, 2013-02-26 quick bugfix to support parallel sort and merge from  recent[0m
     [1mreleases[0m
     BUG FIX Since 2.35, dbmerge delayed removal of input files given by
     "--xargs".  This problem is now fixed.

   [1m2.38, 2013-04-29 minor bug fixes[0m
     CLARIFICATION
         Configure now rejects Windows since tests seem to hang on some versions
         of  Windows.   (I  would love help from a Windows developer to get this
         problem     fixed,     but     I     cannot      do      it.)       See
         [4mhttps://rt.cpan.org/Ticket/Display.html?id=84201[24m.

     IMPROVEMENT
         All  programs  that  use temporary files (dbcolpercentile, dbcolscorre-
         late, dbcolstats, dbcolstatscores) now take the "-T" option and set the
         temporary directory consistently.

         In addition, error messages are better when the temporary directory has
         problems.  Problem reported by Liang Zhu.

     BUG FIX
         dbmapreduce was failing with external, map-reduce aware reducers  (when
         invoked  with  -M  and an external program).  (Sigh, did this case ever
         work?)  This case should now work.  Thanks to Yuri Pradkin for  report-
         ing this bug (in 2011).

     BUG FIX
         Fixed  perl-5.10  problem with dbmerge.  Thanks to Yuri Pradkin for re-
         porting this bug (in 2013).

   [1m2.39, 2013-05-31 quick release for the dbrowuniq extension[0m
     BUG FIX
         Actually in 2.38, the Fedora [4m.spec[24m got cleaner  dependencies.   Sugges-
         tion     from     Christopher     Meng    via    <https://bugzilla.red-
         hat.com/show_bug.cgi?id=877096>.

     ENHANCEMENT
         Fsdb files are now explicitly set into UTF-8 encoding, unless one spec-
         ifies "-encoding" to "Fsdb::IO".

     ENHANCEMENT
         dbrowuniq now supports "-I" for incremental counting.

   [1m2.40, 2013-07-13 small bug fixes[0m
     BUG FIX
         dbsort now has more respect for a user-given temporary directory; it no
         longer is ignored for merging.

     IMPROVEMENT
         dbrowuniq now has options to output the first, last, and both first and
         last rows of a run ("-F", "-L", and "-B").

     BUG FIX
         dbrowuniq now correctly handles "-N".  Sigh, it didn't work before.

   [1m2.41, 2013-07-29 small bug and packaging fixes[0m
     ENHANCEMENT
         Documentation to dbrvstatdiff improved (inspired by questions from Qian
         Kun).

     BUG FIX
         dbrowuniq no longer duplicates singleton unique lines  when  outputting
         both (with "-B").

     BUG FIX
         Add missing "XML::Simple" dependency to [4mMakefile.PL[24m.

     ENHANCEMENT
         Tests  now  show  the diff of the failing output if run with "make test
         TEST_VERBOSE=1".

     ENHANCEMENT
         dbroweval now includes documentation for  how  to  output  extra  rows.
         Suggestion from Yuri Pradkin.

     BUG FIX
         Several  improvements  to  the Fedora package from Michael Schwendt via
         <https://bugzilla.redhat.com/show_bug.cgi?id=877096>,  and   from   the
         harsh  master that is [4mrpmlint[24m.  (I am stymied at teaching it that "out-
         liers" is spelled correctly.  Maybe I should send it  Schneier's  book.
         And an unresolvable invalid-spec-name lurks in the SRPM.)

   [1m2.42, 2013-07-31 A bug fix and packaging release.[0m
     ENHANCEMENT
         Documentation  to  dbjoin  improved  to better memory usage.  (Based on
         problem report by Lin Quan.)

     BUG FIX
         The [4m.spec[24m is now [4mperl-Fsdb.spec[24m to satisfy [4mrpmlint[24m.  Thanks to Christo-
         pher Meng for a specific bug report.

     BUG FIX
         Test [4mdbroweval_last.cmd[24m no longer has a column that caused failures be-
         cause of numerical instability.

     BUG FIX
         Some tests now better handle bugs in old versions of perl (5.10, 5.12).
         Thanks to Calvin Ardi for help debugging this on a Mac with  perl-5.12,
         but the fix should affect other platforms.

   [1m2.43, 2013-08-27 Adds in-file compression.[0m
     BUG FIX
         Changed the sort on [4mTEST/dbsort_merge.cmd[24m to strings (from numerics) so
         we're  less susceptible to false test-failures due to floating point IO
         differences.

     EXPERIMENTAL ENHANCEMENT
         Yet more parallelism in dbmerge: new "endgame-mode" builds a merge tree
         of processes at the end of large merge tasks to  get  maximally  paral-
         lelism.   Currently  this feature is off by default because it can hang
         for some inputs.  Enable this experimental feature with "--endgame".

     ENHANCEMENT
         "Fsdb::IO" now handles being given "IO::Pipe" objects (as exercised  by
         dbmerge).

     BUG FIX
         Handling  of  NamedTmpfiles  now  supports  concurrency.  This fix will
         hopefully fix occasional "Use of uninitialized value $_ in string ne at
         ...NamedTmpfile.pm line 93." errors.

     BUG FIX
         Fsdb now requires perl 5.10.  This is a bug fix because some test cases
         used to  require  it,  but  this  fact  was  not  properly  documented.
         (Back-porting to 5.008 would require removing all "//" operators.)

     ENHANCEMENT
         Fsdb  now  handles automatic compression of file contents.  Enable com-
         pression with "dbfilealter -Z xz" (or "gz"  or  "bz2").   All  programs
         should  operate  on compressed files and leave the output with the same
         level of compression.  "xz" is recommended as fastest  and  most  effi-
         cient.   "gz"  is  produces  unrepeatable  output (and so has no output
         test), it seems to insist on adding a timestamp.

   [1m2.44, 2013-10-02 A major change--all threads are gone.[0m
     ENHANCEMENT
         Fsdb is now thread free and only uses processes for parallelism.   This
         change is a big change--the entire motivation for Fsdb-2 was to exploit
         parallelism  via threading.  Parallelism--good, but perl threading--bad
         for performance.  Horribly bad for performance.  About 20x  worse  than
         pipes on my box.  (See perl bug #119445 for the discussion.)

     NEW "Fsdb::Support::Freds" provides a thread-like abstraction over forking,
         with  some nice support for callbacks in the parent upon child termina-
         tion.

     ENHANCEMENT
         Details about removing threads: "dbpipeline" is thread  free,  and  new
         tests  to  verify  each  of  its  parts.  The easy cases are "dbcolper-
         centile", "dbcolstats", "dbfilepivot", "dbjoin", and "dbcolstatscores",
         each of which use it in simple ways  (2013-09-09).   "dbmerge"  is  now
         thread  free (2013-09-13), but was a significant rewrite, which brought
         "dbsort" along.  "dbmapreduce"  is  partly  thread  free  (2013-09-21),
         again as a rewrite, and it brings "dbmultistats" along.  Full "dbmapre-
         duce" support took much longer (2013-10-02).

     BUG FIX
         When  running  with  user-only  output ("-n"), dbroweval now resets the
         output vector $ofref after it has been output.

     NEW dbcolcreate will create all columns at the head of each  row  with  the
         "--first" option.

     NEW dbfilecat will concatenate two files, verifying that they have the same
         schema.

     ENHANCEMENT
         dbmapreduce now passes comments through, rather than eating them as be-
         fore.

         Also, dbmapreduce now supports a "--" option to prevent misinterpreting
         sub-program parameters as for dbmapreduce.

     INCOMPATIBLE CHANGE
         dbmapreduce  no  longer  figures  out if it needs to add the key to the
         output.  For multi-key-aware reducers, it never does (and cannot).  For
         non-multi-key-aware reducers, it defaults to add the key and  will  now
         fail  if  the reducer adds the key (with error "dbcolcreate: attempt to
         create pre-existing column...").   In  such  cases,  one  must  disable
         adding the key with the new option "--no-prepend-key".

     INCOMPATIBLE CHANGE
         dbmapreduce no longer copies the input field separator by default.  For
         multi-key-aware   reducers,   it   never   does   (and   cannot).   For
         non-multi-key-aware reducers, it defaults to [4mnot[24m copying the field sep-
         arator, but it will copy it (the old default) with the "--copy-fs"  op-
         tion

   [1m2.45, 2013-10-07 cleanup from de-thread-ification[0m
     BUG FIX
         Corrected a fast busy-wait in dbmerge.

     ENHANCEMENT
         Endgame  mode  enabled  in dbmerge; it (and also large cases of dbsort)
         should now exploit greater parallelism.

     BUG FIX
         Test case with "Fsdb::BoundedQueue" (gone since 2.44) now removed.

   [1m2.46, 2013-10-08 continuing cleanup of our no-threads version[0m
     BUG FIX
         Fixed some packaging details.  (Really, threads are no longer required,
         missing tests in the MANIFEST.)

     IMPROVEMENT
         dbsort now better communicates with the merge process to  avoid  bursty
         parallelism.

         Fsdb::IO::Writer  now  can  take "-autoflush =&lt; 1" for line-buffered
         IO.

   [1m2.47, 2013-10-12 test suite cleanup for non-threaded perls[0m
     BUG FIX
         Removed some stray "use threads" in some test cases.   We  didn't  need
         them, and these were breaking non-threaded perls.

     BUG FIX
         Better  handling  of  Fred cleanup; should fix intermittent dbmapreduce
         failures on BSD.

     ENHANCEMENT
         Improved test framework to show output when tests  fail.   (This  time,
         for real.)

   [1m2.48, 2014-01-03 small bugfixes and improved release engineering[0m
     ENHANCEMENT
         Test  suites now skip tests for libraries that are missing.  (Patch for
         missing "IO::Compresss:Xz" contributed by Calvin Ardi.)

     ENHANCEMENT
         Removed references to Jdb in the package specification.  Since the name
         was changed in 2008, there's no longer a huge need for  backwards  com-
         patibility.  (Suggestion from Petr A abata.)

     ENHANCEMENT
         Test  suites now invoke the perl using the path from $Config{perlpath}.
         Hopefully this helps testing in environments where there  are  multiple
         installed  perls  and  the default perl is not the same as the perl-un-
         der-test (as happens in cpantesters.org).

     BUG FIX
         Added specific encoding to this manpage to account  for  Unicode.   Re-
         quired to build correctly against perl-5.18.

   [1m2.49,  2014-01-04 bugfix to unicode handling in Fsdb IO (plus minor packaging[0m
     [1mfixes)[0m
     BUG FIX
         Restored a line in the [4m.spec[24m to chmod g-s.

     BUG FIX
         Unicode decoding is now handled correctly for programs that  read  from
         standard  input.   (Also: New test scripts cover unicode input and out-
         put.)

     BUG FIX
         Fix to Fsdb documentation encoding line.   Addresses  test  failure  in
         perl-5.16  and  earlier.   (Who knew "encoding" had to be followed by a
         blank line.)

   [1m2.50, 2014-05-27 a quick release for spec tweaks[0m
     ENHANCEMENT
         In dbroweval, the "-N" (no output, even comments)  option  now  implies
         "-n", and it now suppresses the header and trailer.

     BUG FIX
         A few more tweaks to the [4mperl-Fsdb.spec[24m from Petr A abata.

     BUG FIX
         Fixed 3 uses of "use v5.10" in test suites that were causing test fail-
         ures (due to warnings, not real failures) on some platforms.

   [1m2.51,  2014-09-05  Feature  enhancements  to  dbcolmovingstats,  dbcolcreate,[0m
     [1mdbmapreduce, and new sqlselect_to_db[0m
     ENHANCEMENT
         dbcolcreate now has a "--no-recreate-fatal" that causes  it  to  ignore
         creation of existing columns (instead of failing).

     ENHANCEMENT
         dbmapreduce  once  again  is  robust  to  reducers that output the key;
         "--no-prepend-key" is no longer mandatory.

     ENHANCEMENT
         dbcolsplittorows can now enumerate the output rows with "-E".

     BUG FIX
         dbcolmovingstats is more mathematically robust.   Previously  for  some
         inputs  and  some  platforms,  floating  point rounding could sometimes
         cause squareroots of negative numbers.

     NEW sqlselect_to_db converts the output of the  MySQL  or  MarinaDB  select
         comment into fsdb format.

     INCOMPATIBLE CHANGE
         dbfilediff now outputs the [4msecond[24m row when doing sloppy numeric compar-
         isons, to better support test suites.

   [1m2.52, 2014-11-03 Fixing the test suite for line number changes.[0m
     ENHANCEMENT
         Test  suites  changes  to  be robust to exact line numbers of failures,
         since   different   Perl   releases   fail    on    different    lines.
         <https://bugzilla.redhat.com/show_bug.cgi?id=1158380>

   [1m2.53, 2014-11-26 bug fixes and stability improvements to dbmapreduce[0m
     ENHANCEMENT
         The dbfilediff how supports a "--quiet" option.

     ENHANCEMENT
         Better documention of dbpipeline_filter.

     BUGFIX
         Added  groff-base and perl-podlators to the Fedora package spec.  Fixes
         <https://bugzilla.redhat.com/show_bug.cgi?id=1163149>.  (Also in  pack-
         age 2.52-2.)

     BUGFIX
         An  important  stability improvement to dbmapreduce.  It, plus dbmulti-
         stats, and dbcolstats  now  support  controlled  parallelism  with  the
         "--pararallelism=N"  option.   They  default  to run with the number of
         available CPUs.  dbmapreduce also moderates its level  of  parallelism.
         Previously it would create reducers as needed, causing CPU thrashing if
         reducers ran much slower than data production.

     BUGFIX
         The  combination  of  dbmapreduce  with  dbrowenumerate now works as it
         should.  (The obscure bug was  an  interaction  with  dbcolcreate  with
         non-multi-key  reducers that output their own key.  dbmapreduce has too
         many useful corner cases.)

   [1m2.54, 2014-11-28  fix  for  the  test  suite  to  correct  failing  tests  on[0m
     [1mnot-my-platform[0m
     BUGFIX
         Sigh,  the  test suite now has a test suite.  Because, yes, I broke it,
         causing many incorrect failures at cpantesters.  Now fixed.

   [1m2.55, 2015-01-05 many spelling fixes and dbcolmovingstats tests are more  ro-[0m
     [1mbust to different numeric precision[0m
     ENHANCEMENT
         dbfilediff now can be extra quiet, as I continue to try to track down a
         numeric difference on FreeBSD AMD boxes.

     ENHANCEMENT
         dbcolmovingstats  gave  different test output (just reflecting rounding
         error) when stddev approaches zero.  We now  detect  hand  handle  this
         case.   See <https://rt.cpan.org/Public/Bug/Display.html?id=101220> and
         thanks to H. Merijn Brand for the bug report.

     BUG FIX
         Many, many spelling bugs found by H. Merijn Brand; thanks for  the  bug
         report.

     INCOMPATBLE CHANGE
         A  number  of programs had misspelled "separator" in "--fieldseparator"
         and "--columnseparator" options as "seperator".   These  are  now  cor-
         rectly spelled.

   [1m2.56, 2015-02-03 fix against Getopt::Long-2.43's stricter error checking[0m
     BUG FIX
         Internal argument parsing uses Getopt::Long, but mixed pass-through and
         <>.    Bug   reported   by   Petr   Pisar   at   <https://bugzilla.red-
         hat.com/show_bug.cgi?id=1188538>.a

     BUG FIX
         Added missing BuildRequires for "XML::Simple".

   [1m2.57, 2015-04-29 Minor changes, with better performance from dbmulitstats.[0m
     BUG FIX
         dbfilecat now honors "--remove-inputs" (previously  it  didn't).   This
         omission  meant  that  dbmapreduce  (and dbmultistats) would accumulate
         files in [4m/tmp[24m when running.  Bad news for inputs with 4M keys.

     ENHANCMENT
         dbmultistats should be faster with lots of small keys.  dbcolstats  now
         supports "-k" to get some of the functionality of dbmultistats (if data
         is pre-sorted and median/quartiles are not required).

         dbfilecat  now  honors  "--remove-inputs" (previously it didn't).  This
         omission meant that dbmapreduce  (and  dbmultistats)  would  accumulate
         files in [4m/tmp[24m when running.  Bad news for inputs with 4M keys.

   [1m2.58, 2015-04-30 Bugfix in dbmerge[0m
     BUG FIX
         Fixed a case where dbmerge suffered mojobake in endgame mode.  This bug
         surfaced  when dbsort was applied to large files (big enough to require
         merging) with unicode in them; the symptom was something like:
           Wide character in print at  /usr/lib64/perl5/IO/Handle.pm  line  420,
         <GEN12> line 111.

   [1m2.59,  2016-09-01  Collect  a  few small bug fixes and documentation improve-[0m
     [1mments.[0m
     BUG FIX
         More IO is explicitly marked UTF-8 to avoid Perl's tendency to mojibake
         on otherwise valid unicode input.  This change helps html_table_to_db.

     ENHANCEMENT
         dbcolscorrelate now crossreferences dbcolsregression.

     ENHANCEMENT
         Documentation for dbrowdiff now clarifies that the default is  baseline
         mode.

     BUG FIX
         dbjoin  now  propagates  "-T"  into  the  sorting process (if it is re-
         quired).  Thanks to Lan Wei for reporting this bug.

   [1m2.60, 2016-09-04 Adds support for hash joins.[0m
     ENHANCEMENT
         dbjoin now supports hash joins with "-t lefthash" and  "-t  righthash".
         Hash  joins  cache a table in memory, but do not require that the other
         table be sorted.  They are ideal when joining a large table  against  a
         small one.

   [1m2.61, 2016-09-05 Support left and right outer joins.[0m
     ENHANCEMENT
         dbjoin  now  handles  left and right outer joins with "-t left" and "-t
         right".

     ENHANCEMENT
         dbjoin  hash  joins  are  now  selected  with  "-m  lefthash"  and  "-m
         righthash"  (not  the  shortlived "-t righthash" option).  (Technically
         this change is incompatible with Fsdb-2.60, but no one but me ever used
         that version.)

   [1m2.62, 2016-11-29 A new yaml_to_db and other minor improvements.[0m
     ENHANCEMENT
         Documentation for xml_to_db now includes sample output.

     NEW yaml_to_db converts a specific form of YAML to fsdb.

     BUG FIX
         The test suite now uses "diff -c -b" rather than  "diff  -cb"  to  make
         OpenBSD-5.9 happier, I hope.

     ENHANCEMENT
         Comments  that  log  operations  at  the end of each file now do simple
         quoting of spaces.  (It is not guaranteed to be fully shell-compliant.)

     ENHANCEMENT
         There is a new standard option, "--header", allowing one to specify  an
         Fsdb  header for inputs that lack it.  Currently it is supported by db-
         coldefine, dbrowuniq, dbmapreduce, dbmultistats, dbsort, dbpipeline.

     ENHANCEMENT
         dbfilepivot now allows the [1m--possible-pivots [22moption, and if it is  pro-
         vided processes the data in one pass.

     ENHANCEMENT
         dbroweval logs are now quoted.

   [1m2.63,  2017-02-03  Re-add  some  features supposedly in 2.62 but not, and add[0m
     [1mmore --header options.[0m
     ENHANCEMENT
         The option [1m-j [22mis now a synonym for [1m--parallelism[22m.  (And  several  docu-
         mention bugs about this option are fixed.)

     ENHANCEMENT
         Additional  support  for  "--header"  in  dbcolmerge, dbcol, dbrow, and
         dbroweval.

     BUG FIX
         Version 2.62 was supposed to have this improvement, but  did  not  (and
         now  does): dbfilepivot now allows the [1m--possible-pivots [22moption, and if
         it is provided processes the data in one pass.

     BUG FIX
         Version 2.62 was supposed to have this improvement, but  did  not  (and
         now does): dbroweval logs are now quoted.

   [1m2.64, 2017-11-20 several small bugfixes and enhancements[0m
     BUG FIX
         In dbroweval, the "next row" option previously did not correctly set up
         "_last_fieldname".  It now does.

     ENHANCEMENT
         The  csv_to_db  converter  now has an optional "-F x" option to set the
         field separator.

     ENHANCEMENT
         Finally dbcolsplittocols has a "--header" option, and a new "-N" option
         to give the list of resulting output columns.

     INCOMPATIBLE CHANGE
         Now dbcolstats and dbmultistats produce no output (but a  schema)  when
         given  no  input but a schema.  Previously they gave a null row of out-
         put.  The "--output-on-no-input" and "--no-output-on-no-input"  options
         can control this behavior.

   [1m2.65, 2018-02-16 Minor release, bug fix and -F option.[0m
     ENHANCEMENT
         dbmultistats  and  dbmapreduce now both take a "-F x" option to set the
         field separator.

     BUG FIX
         Fixed missing "use Carp" in dbcolstats.  Also went back and cleaned  up
         all uses of croak().  Thanks to Zefram for the bug report.

   [1m2.66, 2018-12-20 Critical bug fix in dbjoin.[0m
     BUG FIX
         Removed  old  tests  from  MANIFEST.  (Thanks to Hang Guo for reporting
         this bug.)

     IMPROVEMENT
         Errors for non-existing input files now include the bad  filename  (be-
         fore:  "cannot  setup filehandle", now: "cannot open input: cannot open
         TEST/bad_filename").

     BUG FIX
         Hash joins with three identical rows were failing  with  the  assertion
         failure  "internal  error:  confused about overflow" due to a now-fixed
         bug.

   [1m2.67, 2019-07-10 add support for reading and writing hdfs[0m
     IMPROVEMENT
         dbformmail now has an "mh" mechanism that writes messages to individual
         files (an mh-style mailbox).

     BUG FIX
         dbrow failed to include the Carp library, leading to fails on croak.

     BUG FIX
         Fixed dbjoin error message for an unsorted right stream  was  incorrect
         (it said left).

     IMPROVEMENT
         All  Fsdb programs can now read from and write to HDFS, when files that
         start with "hdfs:" are given to -i and -o options.

   [1m2.68, 2019-09-19 All programs now support automatic  decompression  based  on[0m
     [1mfile extension.[0m
     IMPROVEMENT
         The omitted-possible-error test case for dbfilepivot now has an altner-
         ative output that I saw on some BSD-running systems (thanks to CPAN).

     IMPROVEMENT
         dbmerge and dbmerge2 now support "--header".  dbmerge2 now gives better
         error messages when presented the wrong number of inputs.

     BUG FIX
         dbsort  now  works  with  "--header"  even when the file is big (due to
         fixes to dbmerge).

     IMPROVEMENT
         cvs_to_db now processes data with the "binary" option, allowing  it  to
         handle newlines embedded in quoted fields.

     IMPROVEMENT
         All programs now will transparently decompress input files, if they are
         listed  as a filename as an input argument that extends with a standard
         extension (.gz, .bz2, and .xz).

   [1m2.69, 2019-11-22 a small bugfix in dbcolstats[0m
     BUG FIX
         Filled in the the test case for autodecompress, which was  missing  for
         the 2.68 release.

     ENHANCEMENT
         The groff program is required for build, and the "Makefile.PL" fails if
         groff  is missing at build time.  Thanks to Chris Williams for suggest-
         ing this check, and the CPAN auto-building system for trying many plat-
         forms.

     BUG FIX
         The dbcolstats program had numerical instability that sometimes results
         in failing with a square-root of a negative  number  when  many  values
         varied  right  at  the edge of floating-point precision.  We now detect
         and report that case as 0 stddev.  Thanks to Hang Guo for  providing  a
         test case.

   [1m2.70, 2020-11-12 Some small quality-of-life enhancements and corner-case bug-[0m
     [1mfixes.[0m
     ENHANCEMENT
         dbcol  can now take an option "-a" to include all columns, allowing re-
         ordering of certain columns while passing the rest through.

     ENHANCEMENT
         dbrowuniq and dbmerge now buffer comments in a way that the last row of
         data output is no longer in the last block of comments.  (The  data  is
         identical,  but for humans looking at output, this change makes it less
         likely to lose the last row.)

     BUG FIX
         dbmultistats and dbpipeline documentation now indicates that they  sup-
         port  "--header"  (something they did since version 2.62 in 2016-11-29,
         but now documented.

     ENHANCEMENT
         dbcolcreate now supports "--header".

     BUG FIX
         Fixed several spelling errors in deprecated programs and removed infor-
         mation about the no-longer existing FreeBSD and MacOS ports.  Thanks to
         Calvin Ardi for the patch.

     BUG FIX
         dbmerge now handles --xargs when only one file is provided (and  passes
         the file through unchanged).  It also throws a clean error with --xargs
         if  zero files are provided.  (To support dbmerge, dbcol now has an in-
         ternal "--saveoutput" option.)  Thanks to Yuri  Pradkin  for  reporting
         the unhandled corner-case.

   [1m2.71, 2020-11-16 Fix a race condition breaking test suites.[0m
     BUG FIX
         Suppress  a race condition in dbcolmerge was sometimes throwing the er-
         ror "Fsdb::Support::Freds: ending, but running process:  dbmerge:xargs"
         in the dbmerge_0_xargs test case, on exit.

   [1m2.72, 2020-12-01 A small bug and a packaging improvement.[0m
     BUG FIX
         dbcolhisto  now  handles  the  degenerate case where everything has the
         same value (previously it would throw "illegal division by zero").

     ENHANCEMENT
         The spec for Fedora now includes "make" as BuildRequires, something re-
         quired for Fedora 34.

   [1m2.73, 2021-05-18 Updates dbcolpercentile with  "--weighted",  and  with  more[0m
     [1mipv6.[0m
     ENHANCEMENT
         dbcolpercentile now has a "--weighted" option.

     ENHANCEMENT
         The  new  Fsdb::Support::IPv6 package includes ipv6_normalize, ipv6_ze-
         roize to rewrite ipv6 print addresses in IPv6 normal form, with a 0  in
         each 4-nybble field.

   [1m2.74, 2021-06-23 More ipv6.[0m
     ENHANCEMENT
         Fsdb::Support::IPv6 package includes ipv6_fullhex to rewrite ipv6 print
         addresses as full, 128-bit hex values.

   [1m2.75, 2022-04-02 New type specifications in the schema to better support type[0m
     [1mconversions in python.[0m
     ENHANCEMENT
         Add  optional type specifications to the schema.  Types are not used in
         Perl, but are relevant in Python and Go Fsdb  bindings.   Types  use  a
         subset  of  perl  pack specifiers: c, s, l, q are signed 8, 16, 32, and
         64-bit integers, f is a float, d is double float, a  is  utf-8  string,
         and &gt; and &lt; can force big or little endianness.  The default type
         for  everything is "a", that is, utf-8 strings.  Thanks to Wes Hardaker
         for pushing to get this long-desired feature out the door;  his  Python
         bindings need types.

     ENHANCEMENT
         dbcol,  dbcolcreate,  dbcolcopylast, and dbcolrename now understand and
         propagate schema types.  dbsort,  dbjoin,  dbmerge,  dbmerge2  and  db-
         filepivot  all take a new option "-t" to sort by type-inferred compari-
         sion, if a type is given.

     ENHANCEMENT
         dbcolstat, dbmultistats, and dbcolmovingstats now include type informa-
         tion in their output schema.  (They assumes input variables are floats,
         not integers.)

     ENHANCEMENT
         Even more IPv6: the functions in Fsdb::Support::IPv6 package  now  sup-
         port strings of hex digits as an alternate encoding for IP address (and
         they  are  already the output of ipv6_fullhex), and "ip_fullhex_to_nor-
         mal" converts full hex-encoded IPv4 or IPv6 addresses to their "normal"
         form (dotted-quad or IPv6 printable format).

   [1m3.0, 2022-04-04 Complete type support and accordingly bump major version.[0m
     NEW The major version number is now 3.0 to correspond to  the  addition  of
         types  (although they were actually added in 2.75).  Old fsdb files are
         supported (Fsdb-3.0 is backwards compatible with databases), but  older
         versions  will  confuse types in new files (new Fsdb files are not for-
         ward compatible with old versions).

     ENHANCEMENT
         Type specifications in a few more  programs:  dbcolhisto,  dbcolscorre-
         late,  dbcolsregression,  dbcolstatscores, dbrowaccumulate, dbrowcount,
         dbrowdiff, dbrvstatdiff.

     ENHANCEMENT
         dbcolhisto now puts an empty value on any empty rows.

     NEW dbcoltype redefines column types, or clears them with the "-v" option.

   [1m3.1, 2022-11-22 A post-3.0 cleanup release with minor fixes.[0m
     ENHANCEMENT
         Type specifications in a few more programs that  I  missed:  dbrowuniq,
         dbcolpercentile.

     ENHANCEMENT
         Minor documentation improvements.

   [1m3.2, 2023-10-11 Add new module dbcolsdecimate[0m
     NEW dbcolsdecimate  reduces  density in timeseries data to make graphs with
         overly dense points visually similar but smaller.

     ENHANCEMENT
         yaml_to_db now flattens one level of arrays into comma-separated lists.

     ENHANCEMENT
         Clearer installation instructions.

   [1m3.3, 2023-10-13 Quickly making dbcolsdecimate more flexible.[0m
     INCOMPATBILE ENHANCEMENT
         dbcolsdecimate now takes either relative ([1m-p[22m) or absolute  ([1m-P[22m)  preci-
         sion,  and precision now affects only subsequent columns.  Also, if ab-
         solute precisions are given for all columns, data is not buffered.

   [1m3.4, 2024-01-05 Correct propagation of TMPDIR options into subprograms.[0m
     ENHANCEMENT
         dbcolsdecimate now has examples in its documentation.

     BUG FIX
         dbcolsstats, dbmapreduce, dbcolpercentile,  dbfilepivot,  and  dbmulti-
         stats  now  correctly  propagate  the temporary directory into the sort
         route, if required.  All of these programs sometimes require  sort  in-
         ternally, and previously may have failed to use the correct tmpdir when
         it was set on the command line as an option.  Thanks to Erica Stutz for
         noticing this bug.

   [1m3.5, 2024-06-07 Weighted stats added to dbcolstats and dbcolscorrelate[0m
     ENHANCEMENT
         dbrowdiff  now has a "--future" option that compares incrementally with
         the next row rather than the previous.

     ENHANCEMENT
         dbcolstats and dbcolscorrelate now optionally do  weighted  stats  with
         the "--weight" option.

   [1m3.6,  2024-06-08 Bugfix in weighted stats for dbcolscorrelate; column options[0m
     [1min dbrowdiff.[0m
     BUG FIX
         dbcolscorrelate was not properly applying weighting.

     ENHANCEMENT
         dbrowdiff now has "-A" and "-P" options to set output column names.

   [1m3.7, 2024-06-19 Adding weighted stats for dbmultistats.[0m
     ENHANCEMENT
         dbmultistats now supports weighted stats.

   [1m3.8, 2024-07-01 Bugfix in dbmerge.[0m
     BUG FIX
         dbmerge now correctly handles the case when invoked with "--xargs" with
         exactly two input files.  Thanks to Erica Stutz for reporting this  er-
         ror.

   [1m3.9, 2024-08-02 Fixing dbjoin type propagation.[0m
     ENHANCEMENT
         dbrowsdiff  now  has  a  -e  option  to specify the value to use in fu-
         ture-mode for the last row.

     BUG FIX
         dbjoin now propagates types, rather than eating them.

   [1m3.10, 2025-03-01 Bug fix for some IPv6 addresses,  snzip  support,  and  auto[0m
     [1mcompression.[0m
     ENHANCEMENT
         The  license  in  the  .spec is now SPDX-compliant.  Thanks to Miroslav
         Such~A1/2 for the patch.

     BUG FIX
         Fsdb::Support::IPv6 now handles  IPv4-embedded-in-IPv6  print-form  ad-
         dresses.

     ENHANCEMENT
         We  now  automatically decompress ".sz" files (assuming "snzip") is in-
         stalled).

     ENHANCEMENT
         We now automatically compress files written with a standard compression
         extension.

   [1m3.11, 2025-03-06 internal performance enhancements[0m
     ENHANCEMENT
         Internet temporary files for sort, if required, are now compressed, us-
         ing snzip if possible, or  gzip  otherwise.   (In  many  cases  CPU  is
         cheaper than I/O, particularly when the data compresses well.)

     ENHANCEMENT
         When  dbsort  overflows  to  disk, it now removes temporary files as it
         goes, when possible.  It now requires less secondary storage  for  very
         large  inputs.  (Secondary storage during merge was actually O(n log n)
         before, and now is back down to O(n), where n is input size.)

   [1m3.12, 2025-03-06 internal performance enhancements[0m
     ENHANCEMENT
         dbmerge intermediate files are now  compressed,  reducing  intermediate
         storage.

   [1m3.13, 2025-04-16 bugfix for intermediate compression[0m
     BUG FIX
         Temporary  file  autocompression  was  locked to gzip due to a bug.  We
         correctly now use snzip when it is available.

     BUG FIX
         Improved test suite handling of compressed files, and improved  packag-
         ing for RHEL-8.

   [1m3.14,  2025-06-09  New  filter  dbrowuniqcount  computes counts of all unique[0m
     [1mrows.[0m
     ENHANCEMENT
         New filter dbrowuniqcount computes counts of all unique  rows,  regard-
         less of adjacency, in O(u) memory where u is the number of unique rows.

   [1m3.15, 2025-12-08 Bug fixes and cleanup for a messy 3.14 release.[0m
     ENHANCEMENT
         tcpdump_to_db now handles some forms of modern tcpdump output for TCP.

     ENHANCEMENT
         dbformmail  now  opens the template in unicode, and handles comma-sepa-
         rated CCs in Mail mode.

[1mAUTHOR[0m
     John Heidemann, "johnh@isi.edu"

     See "Contributors" for the many people who have contributed bug reports and
     fixes.

[1mCOPYRIGHT[0m
     Fsdb is Copyright (C) 1991-2024 by John Heidemann <johnh@isi.edu>.

     This program is free software; you can redistribute it and/or modify it un-
     der the terms of version 2 of the GNU General Public License  as  published
     by the Free Software Foundation.

     This program is distributed in the hope that it will be useful, but WITHOUT
     ANY  WARRANTY; without even the implied warranty of MERCHANTABILITY or FIT-
     NESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
     details.

     You should have received a copy of the GNU  General  Public  License  along
     with this program; if not, write to the Free Software Foundation, Inc., 675
     Mass Ave, Cambridge, MA 02139, USA.

     A  copy  of the GNU General Public License can be found in the file ``COPY-
     ING''.

[1mCOMMENTS and BUG REPORTS[0m
     Any comments  about  these  programs  should  be  sent  to  John  Heidemann
     "johnh@isi.edu".

perl v5.42.2                       2026-06-08                            [4mFsdb[24m(3)
