This benchmark specifies a small program which is intended to capture
the essence of a telephone company billing application, with a realistic
balance between Input/Output activity and application calculations.
This is expected to allow
- the comparison of Input/Output time with calculation time
-
the investigation of different implementation approaches.
Note that:
- Call durations and rates vary significantly from country to country.
-
Taxes are calculated differently in different countries.
-
Billing calculations can be significantly more complicated than the ones
included in this benchmark.
Examples of these and other differences, and their effects on the
results, are discussed below.
This benchmark, therefore, does not exactly match any specific
company’s billing system. Instead, the distributions, rates,
etc., are intended to give a representative result (involving
calculations of appropriate precision and reasonable complexity) rather
than match a specific model. In general, these results will
underestimate the costs of calculation.
For some more background information and some initial results, see
http://speleotrove.com/decimal/telco.html.
Acknowledgements
Thanks are due to the application engineers who suggested and
verified the benchmark, and to
Robert Berry (IBM)
and
Pamela Taylor (SHARE) for suggestions
and improvements to the benchmark.
Specification
In this description, the notation [name=value] represents a value or
setting that might be a constant or a parameter, depending on the
implementation of the benchmark.
Inputs and Outputs
Two files are used:
For both files, buffered input and output routines must be used,
using the default buffer size for the platform.
Calculations
All calculations (except for the one case where rounding is explicit) must
be exact. If a suitable decimal type or decimal package is available
for the platform, this must be used for the calculations.
Benchmark steps
The benchmark comprises the following steps:
-
If a file with the same name as the output file, f, exists, then it
is erased or renamed.
-
At this point, timing of the benchmark begins.
-
Three variables are initialized to zero:
sumT = 0 -- sum of total prices
sumB = 0 -- sum of basic tax
sumD = 0 -- sum of 'distance' tax
-
The output text file, f, is created (opened).
-
A sequence of numbers, representing call duration times in seconds, are
read from the input file, i. Reading the number from the file,
and any initial conversion into the numerical form used by the
following calculations, is considered to be part of the Input/Output
time of the benchmark.
After each number, n, is read the following calculations are carried
out:
-
A call type indicator, c, is set from the bottom (least
significant) bit of the duration (hence c is 0 or 1).
-
A rate, r, is determined from the call type. Those calls with c=0 have
a low rate: 0.0013; the remainder (‘distance calls’) have
a ‘premium’ rate: 0.00894.
(The rates are, very roughly, in Euros or dollars per second.)
-
A price, p, for the call is then calculated (p=r*n).
This is rounded to exactly 2 fractional digits using
round-half-even (Banker’s round to nearest).
-
A basic tax, b, is calculated: b=p*0.0675 (6.75%). This is
truncated to exactly 2 fractional digits (round-down), and
the total basic tax variable is then incremented (sumB=sumB+b).
-
For distance calls: a distance tax, d, is calculated:
d=p*0.0341 (3.41%). This is truncated to exactly 2
fractional digits (round-down), and then the total distance
tax variable is incremented (sumD=sumD+d).
-
The total price, t, is calculated (t=p+b, and, if a
distance call, t=t+d).
-
The total prices variable is incremented (sumT=sumT+t).
-
The total price, t, is converted to a string, s.
The string s is then written to file f as a line (there is
therefore one value per line in the completed file).
-
Once all the numbers in the input file i have been processed, any
pending output data are flushed to disk, if possible, and then the
input file i and the output file f are closed.
-
At this point, timing of the benchmark ends.
-
The values sumT, sumB, sumD are displayed for verification. For the
short 10-number test file [infile=telco.test or infile=telco.testb]
the values should be:
sumT = 8.91
sumB = 0.50
sumD = 0.22
Details of the individual calculations (and the contents of the output
file, f) for this short test case can be found on the introduction
page at
http://speleotrove.com/decimal/telco.html.
For the 1,000,000-number file, exponential distribution with a mean of
180 [infile=expon180.1e6 or infile=expon180.1e6b], the sum values
should be:
sumT = 1004737.58
sumB = 57628.30
sumD = 25042.17
(In either case, if sumD is unexpectedly 0 then probably the BCD data
file is being used when the testcase expects the binary data file.)
Time comparisons
Comparison of the relative timing of Input/Output and calculations can
be made by having the inner steps 1-8 above be controlled by a
compile-time or run-time switch which either carries out the eight steps
of the calculation as described or prepares a four-character string to
be written to the output file as step 9. (A runtime switch is more
convenient, and its runtime cost will be neglible.)
The model used in the benchmark makes a number of simplifications in
order to increase the generality of the test. Here are some answers to
specific questions about the simplifications.
-
In the benchmark, only two rates are used, and these are equally
likely. In real systems many different rates are used. These are
dependent on a number of different variables, such as the timing of
the call (prime, off-peak, etc.), the geography of the call
(interstate, international, etc.), the type of telephone in use
(land line, mobile, etc.), and so on. Why does the benchmark
use only two, equally likely, rates?
-
The choice of two rates means that a realistic spread of rate
precisions (about 3 bits) is used in the calculations, giving a
reasonable estimate of the computational effect of multiple rates. The
50% choice is roughly the proportion between low-rate and high-rate
calls in countries with a high penetration of mobile phone usage.
The 50% ratio also means that the split can be determined fairly and
cheaply from the bottom (least significant) bit of the number n,
regardless of whether the number is expressed in a decimal or
binary base. This allows the data file to be a simple
‘generic random number distribution’ instead of being a
data file specific to this benchmark.
-
Only two tax rates are used. Billing schemes can be far more complex,
quite often with more than two tax authorities involved. Also there
are sometimes special plans which allow free calls up to a
fixed monthly fee, and so on. Why not add more calculations into the
benchmark?
-
Billing calculations do vary considerably. In some countries, there
is only one ‘telephone tax’. Two is common. In the USA
there can be as many as four on a single call.
The ‘two-tier’ tax system used in the benchmark was chosen
to be simpler than most billing schemes, while not being significantly
more complex than the simplest of schemes. (The 50-50 choice was made
for the same reasons as just discussed.)
If a more complex model were used, it might overestimate the amount of
calculation required for many situations. The effect of the more
complex model on the results would be an increased amount of
calculation and related logic, compared to Input/Output activity.
-
The mean call duration is 180 seconds; is this realistic?
-
This number (three minutes) was chosen as a reasonable mean, based on
statistics from a number of countries in North America, Europe and the
Pacific Rim. Quite large variations in this mean would not
significantly alter the precisions, and therefore the computational
costs, of the calculations.
-
Why is an exponential distribution used for the call times?
-
This distribution is commonly used for modeling call times. Although
there are better models (e.g., log-normal), this one is
extremely easy to generate and gives an appropriate spread of
precisions in the input numbers.
-
Only about 16 bits (4 digits) are needed for the input numbers; why
use 64 bits (15 digits)?
-
Partly to make sure the benchmark doesn’t underestimate the
amount of data to be read from disk, and partly to allow the use of a
‘generic’ random number distribution data file.
-
Why does the package require the use of decimal arithmetic (if
available)?
-
Decimal arithmetic can be used to assure the exactness of the results
(see the Decimal
FAQ).
Also, using a decimal data type reflects actual applications of this
type (the data are often held in decimal form, and using decimals
improves maintainability compared to manual scaling of integers).
Note that the format of the input file (binary or packed decimal) may
be chosen to suit the representation used by the decimal arithmetic
package used.
-
In a ‘real’ application, the input data are likely to be
derived from a database. Won’t reading the data from a file
underestimate the Input/Output costs?
-
Reading from a database should be close in speed to reading from a
disk. However, to verify this it is hoped to couple the benchmark to
a database at some point. In the meantime, measurements are being
made with the files being network files (Ethernet connected) as well
as using ‘local’ files. The network file read time should
be significantly larger than database read times and will therefore
place an upper bound on results.
Mike Cowlishaw,
mfc@speleotrove.com