The High Boskage House Baseball-Analysis Web Site
baseball team and player performance examined realistically and accurately

  email me search site site directory  

The HBH Baseball-Analysis Formula
and Other Similar Equations

This page is essentially notes on work in progress;
it is subject to frequent changes as work goes forth.

Run-Scoring Formulae

Team-Season-Level Results

From work discussed elsewhere, the chief competitors--if we want to put it that way--to the TOP formula are "Extrapolated Runs" (denoted XR), a linear-weights formulation, and a variant of the so-called "Technical Runs Created" equation, which, like the TOP, is multiplicative in nature. But, for various reasons, instead of the Runs-Created variant, we will here consider "Base Runs" (denoted BR), in that they are held by some to best model actual run creation.

Here, by the way, is a summary of the team-season results:

  Team-Seasons: 1138

    TOP Average Error Pct.: 2.32054013159
     XR Average Error Pct.: 2.53012140594
     BR Average Error Pct.: 2.68385581113

    TOP Average Error Size: 16.0279001468
     XR Average Error Size: 17.4948604993  +1.46696030 runs
     BR Average Error Size: 18.6292217327  +2.60132158 runs

Team-Game-Level Results

About the Sample

Elsewhere the overall accuracy of these methods has been demonstrated as applied to 1138 team-seasons. But skeptics assert that what is accurate for 1138 amalgamated sets of something from 1386 to 1458 innings each set can falsely make an inaccurate formula seem accurate; the acid test, they say, is the accuracy of equations at much smaller scales--the game or, ideally, the inning. Let's begin by looking at a sample of some individual games.

Because I need to transcribe per-game stat lines from game box scores, there is a small but definitely non-zero amount of time and effort for each one; I thus limited myself for now to two dozen such lines (from a dozen games); the dozen games are because 55 years makes a dozen 5-year intervals, which seemed a satisfactory distribution. I went to some pains to make the sample as scattered and random as possible: one game every five seasons, starting with 1955 (that is, from 1955, 1960, 1965, etc.); each game was taken from the league other than the previous game; the first game was taken at the start of its season, with succeeding games were taken about five years and two weeks later, so that the game dates progress through the season; and for each approximate date thus arrived at, a pair of teams was chosen neither of which had yet been used. The sample includes games with zero runs scored, and games with 12 runs scored, and a fair spread in between.

Whether 24 games lines is a sample large enough to be meaningful (not probative, of course, but at least suggestive) is a matter that can be calculated to a nicety, but franjly I couldn't be bothered. When the A. C. Nielsen Company is paid substantial monies to determine the television-watching tastes of 300 million Americans from those of a few hundred families (not to speak of the work of folk like the Gallup and Rasmussen organizations), I think the principle is clear. Anyone who disagrees is warmly invited to produce more data and evaluations.


Actual Results

The results are shown directly below. Following them, the methodology is presented (the exact formulae used, with numerical coefficients; the particular games and their exact data; and even the actual PHP code used for the reckonings).

  Teams: Actual TOP: BR: XR:  
Date Org Vs. Runs Runs Err Runs Err Runs Err best
1955-04-11 BAL wsh 5 3 -2 3 -2 2 -3 TOP  BR
1955-04-11 WSH bal 12 7 -5 7 -5 7 -5 equal
1960-04-19 CHC stl 2 3 1 3 1 3 1 equal
1960-04-19 STL chc 5 6 1 7 2 6 1 TOP  XR
1965-05-04 CHW det 10 10 0 7 -3 7 -3 TOP
1965-05-04 DET chw 6 6 0 4 -2 4 -2 TOP
1970-05-16 ATL cin 0 1 1 2 2 2 2 TOP
1970-05-16 CIN atl 2 5 3 6 4 6 4 TOP
1975-06-01 BOS min 11 8 -3 9 -2 9 -2 BR  XR
1975-06-01 MIN bos 9 11 2 9 0 8 -1 BR
1980-06-15 HOU pit 1 2 1 3 2 3 2 TOP
1980-06-15 PIT hou 4 3 -1 3 -1 3 -1 equal
1985-07-01 CAL tex 5 5 0 5 0 5 0 equal
1985-07-01 TEX cal 10 8 -2 8 -2 7 -3 TOP  BR
1990-07-23 NYM phi 4 4 0 4 0 4 0 equal
1990-07-23 PHI nym 7 5 -2 4 -3 4 -3 TOP
1995-07-30 CLE sea 5 5 0 5 0 4 -1 TOP  BR
1995-07-31 SEA cle 2 3 1 3 1 3 1 equal
2000-08-16 FLA lad 4 6 2 6 2 6 2 equal
2000-08-16 LAD fla 10 7 -3 7 -3 7 -3 equal
2005-09-04 NYY oak 7 11 4 11 4 10 3 TOP  BR
2005-09-04 OAK nyy 3 4 1 3 0 3 0 BR  XR
2009-10-04 SDP sfo 3 3 0 2 -1 2 -1 TOP
2009-10-04 SFO sdp 4 5 1 6 2 5 1 TOP  XR


Summary

Method Average
Error
Size
Cumulative
Error
Uniquely
Best
Tied
for
Best
All
Equal
Not
Bettered
TOP: 1.50 0 7/24 6/24 8/24 21/24   (87.5%)
BR: 1.83 -4 1/24 6/24 8/24 15/24   (62.5%)
XR: 1.88 -11 0/24 4/24 8/24 12/24   (50.0%)

While TOP certainly looks nontrivially better at this level, we cannot forget the sample size. But it does seem safe to say that there is no suggestion of any one of these equations being far out of line--either as a success or as a failure--compared to the others.

In the great descent from aggregations of from 1386 to 1458 innings (more or less) to aggregations of a mere 8, 9, and 10 innings, the relative strength of TOP for predicting runs from the raw data is unaltered: it remains clearly the best. The next step is to try these methods on a reasonable number of individual innings.


Half-Inning-Level Results

About the Sample

Owing to the great kindness of Adam Dorhauer, I now have available a database of per-inning stat lines. It covers all innings (technically half-innings) played from 2000 through 2008, inclusive. The accompanying email contained the caveat that "I think these stats are accurate, but since they are from my own query of the Retrosheet PBP files, there could be an error that I missed in my code, so I can't guarantee they are as accurate as something that's been rigourously checked, like the Retrosheet files themselves or other published databases." I have, however, every confidence in their accuracy.

Now while 389,042 innings may seem a lot, and in many ways is, we need to say a word about "tuning" of these various equations. The XR equation I have not attempted to "tune" for best results over any period because it is not obvious how its maker would have gone about such tuning. The TOP equation contains coefficients that were initially "tuned" to get minimum error over the 55-year period they encompass (and I would have used more years were commensurable stats available). The BR equation is intended, according to its maker, David Smyth, to be "tuned" by the value of one coefficient: "If you want to tailor a version to a particular dataset (such as 1993-2004, or the 1975 AL), all you have to do is determine the overall B multiplier." The team-season results above were so tuned, and so also were the individual-inning results (the nominal coefficient is 1.1; the 55-team-season value is 1.0889, the 8-year innings value 1.05069. The TOP coefficients for the per-inning work were also re-tuned (for details, see the Methodologies section farther below).

My own opinion is that there is a fair chance that using an individual-innings database that covers all or most of the 55-year period would yield slightly worse results for BR, in that it seems slightly more sensitive to "tuning" than the others (which is to say, more sensitive to such average interactions of play as change over the decades in baseball--remember that we have, in the 55-year period, three distinct baseballs, that of 1955 to 1976, that of 1977 to 1992, and that of 1994 to 2009, with 1993 a "transition" season, and that doesn't even touch on the effects of the DH and other such phenomena). Obliged to average across a long period with greater variances in thise interactions, there might well be results at least somewhat different. If anyone has a database of inning stats other than or larger than 2000 - 2008, I'd love a copy.


Actual Results

All that said, here are the actual results:

Method Average
Error
Size
BR: 0.232548157
TOP: 0.241989297
XR: 0.261624709

In short, BaseRuns was, over those 8 seasons, about .009 of a run more accurate than TOP, and about .029 of a run better than XR.

All of these calculations bear out the original contention that most or all of the major run-production estimation formulae work about equally well.




Methodologies

The Actual Formulae

So that there can be no confusion over what we are speaking of, here are the exact equations:

  TOP:
  
    PA = AB + BB + HB + SH + SF + CI
    NetOB = PA - Outs
    WeightedHits = (K1 x [1B + Eb]) + (K2 x 2B) + (K3 x 3B) + (K4 x HR)
    FreePasses = BB + HB + CI
    GrossAdvanceFactor = WeightedHits + (Kbbhb x FreePasses) + (Ksh x SH) + (Ksb x SB)
    AdvanceFactor = GrossAdvanceFactor / PA
    Block = AdvanceFactor x Kslope
    Multiplier = Block + Kb
    Rtop = ([NetOB - HR] x Multiplier) + HR

      where, for 55 years (1955 - 2009) of team-season data:
       Kbbhb=1;
       K1=2.38782
       K2=3.37
       K3=6.09
       K4=3.7704
       Ksb=1.52
       Ksh= -0.4859
       Kslope=0.499377343455
       Kb= -0.0600151086521

      and where, for 9 years (2000 - 2008) of inning data:
       Kbbhb=1;
       K1=2.4118
       K2=3.347
       K3=6.076
       K4=3.7991
       Ksb=1.5424
       Ksh= -0.454
       Kslope=0.488796403119
       Kb= -0.0547215599553

  BR:

    PartA = H + BB + HB - HR - (0.5 x IBB)
    PartB = ([1.4 x TB] - [0.6 x H] - [3.0 x HR] + [0.1 x {BB - IBB + HB}] + [0.9 x {SB - CS - GDP}]) x Kbr
    PartC = AB - H + CS + GDP
    PartD = HR
    Rbr = (PartA x [PartB / {PartB + PartC}]) + PartD

      where, for 55 years (1955 - 2009) of team-season data:
      Kbr=1.0889

      and where, for 8 years (2000 - 2008) of inning data:
      Kbr=1.05069

  XR:

    Rxr = (0.50 × 1B) +
          (0.72 × 2B) + 
          (1.04 × 3B) + 
          (1.44 × HR) + 
          (0.18 × SB) + 
          (0.04 × SH) + 
          (0.37 x SF) + 
          (0.25 × IBB) - 
          (0.32 × CS) - 
          (0.098 x SO) - 
          (0.37 × GDP) +
          (0.34 × [HP + TBB − IBB]) - 
          (0.090 × [AB − H − K])

Raw Game Data

Because errors in tracscribing box-score data are possible, here--for those who would cross-check or spot-check--are the data used:


Date      Org  Vs.    R   AB   BB   HB   SH   SF   CI   PA   1B   2B   3B   HR    H   TB  IBB   SB   CS   SO  GDP   Eb Outs
19550411  BAL  wsh    5   32    1    2    1    0    0   36    3    2    1    0    6   10    0    0    0    3    1    2   27
19550411  WSH  bal   12   33    5    2    0    2    0   42    6    4    0    0   10   14    0    1    0    4    0    1   24
19600419  CHC  stl    2   32    4    0    1    0    0   37    6    0    0    1    7   10    1    0    0   11    1    1   27
19600419  STL  chc    5   31    6    0    1    0    0   38    8    2    0    1   11   16    1    0    1    2    1    0   24
19650504  CHW  det   10   38    5    1    1    1    0   46   10    2    1    0   13   17    0    0    1    7    1    5   27
19650504  DET  chw    6   36    4    1    0    0    0   41    7    1    0    1    9   13    0    0    0   12    2    2   27
19700516  ATL  cin    0   31    3    0    1    0    0   35    3    2    0    0    5    7    0    0    0    3    0    0   27
19700516  CIN  atl    2   32    3    0    0    0    0   35    7    2    0    1   10   15    0    3    0    9    1    0   24
19750601  BOS  min   11   38    3    0    0    1    0   42    6    2    0    4   12   26    0    0    0    5    0    0   27
19750601  MIN  bos    9   39   10    0    0    0    0   49   10    0    1    1   12   17    0    1    0    5    1    3   27
19800615  HOU  pit    1   32    3    0    1    0    0   36    7    1    0    0    8    9    0    1    0    3    1    0   27
19800615  PIT  hou    4   31    0    0    0    0    0   31    7    1    0    1    9   13    0    1    1    3    1    0   24
19850701  CAL  tex    5   34    2    0    0    1    0   37   10    1    0    1   12   16    0    0    0    2    2    2   27
19850701  TEX  cal   10   33    4    0    1    0    0   38    7    3    1    1   12   20    0    0    0    4    2    2   24
19900723  NYM  phi    4   35    2    0    1    0    0   38    5    2    0    1    8   13    1    1    0    7    0    1   27
19900723  PHI  nym    7   30    8    0    1    0    0   39    5    0    1    0    6    8    1    2    0    8    0    1   24
19950730  CLE  sea    5   34    5    0    0    0    0   39    5    0    0    2    7   13    0    1    0    7    1    1   27
19950731  SEA  cle    2   31    1    0    0    0    0   32    2    1    0    2    5   12    0    0    0    6    0    0   27
20000816  FLA  lad    4   35    4    0    0    0    0   39   11    1    0    1   13   17    0    0    1    8    2    0   27
20000816  LAD  fla   10   34    4    1    1    1    0   41    8    1    1    1   11   17    0    2    1    6    0    1   27
20050904  NYY  oak    7   39    9    1    0    0    0   49    9    1    1    2   13   22    0    2    0    8    1    0   27
20050904  OAK  nyy    3   34    4    0    0    0    0   38    5    1    0    1    7   11    0    0    1    4    1    2   27
20091004  SDP  sfo    3   34    5    1    1    0    0   41    4    0    0    1    5    8    1    0    1    8    1    1   30
20091004  SFO  sdp    4   39    3    0    0    0    0   42   10    1    0    1   12   16    1    3    0   14    2    0   30

The datum Eb is supposed to be opponents' errors resulting in an otherwise-out batter safely reaching base. But that datum, though required of the Official Scorer for every game, is not (that I can find) published in individual game results (though Baseball-Reference.org has it for seasonal results). In the transcribing, the total of all opponents' errors was used, which may introduce an occasional rather small bias, which has been neglected here. The datum Outs is simply opponents' innings pitched times 3. It is unclear whether CI (catcher's interference) is recorded in box scores, but all those used did balance up (PA = R + LOB + Outs).


Actual Code

Just so that everything whatever is open and aboveboard, shown below are the actual PHP code snippets used to make the calculations:


Team-Seasons Code

While the exact code varies from equation to equation, most of it is the same:

<?php

  $title='BASELINE';  // substitute identifier for particular equation

  // General Constants:
  $lf=chr(10);
  $crlf=chr(13).$lf;
  $br='<br/>'.$lf;
  $p='<br/>'.$br;

  // Accuracy Counters:
  $teamseasons=0;
  $cumerror=0;
  $cumsize=0;
  $cumpct=0;
  $cumerrsq=0;
  
  // Local Constants:
/*
    INSERT REQUIRED CONSTANTS FOR PARTICULAR EQUATION BEING EVALUATED
*/
 
  // Setup & Run:
  $main=file('FullBat.ByTeam');
  $dummy=NULL;
  $pos=NULL;
  $negs=NULL;
  $zeros=NULL;
  foreach ($main as $line)
  {
    $line=rtrim($line);
    if (strpos($line,'Season')!==FALSE)
    {
      $dummy[]='    Season       Org      Runs      Proj       Err       Pct'.$lf;
      continue;
    }
    $season=trim(substr($line,0,10));
    if ($season=='1954') continue;  // for just comparison with others...    
    $teamseasons=$teamseasons+1;
    
    $org=trim(substr($line,10,10));
    $league=trim(substr($line,20,10));
    $games=trim(substr($line,30,10));
    $runs=trim(substr($line,40,10));
    $pa=trim(substr($line,50,10));
    $ab=trim(substr($line,60,10));
    $sgl=trim(substr($line,70,10));
    $dbl=trim(substr($line,80,10));
    $tpl=trim(substr($line,90,10));
    $hr=trim(substr($line,100,10));
    $bb=trim(substr($line,110,10));
    $hb=trim(substr($line,120,10));
    $sb=trim(substr($line,130,10));
    $cs=trim(substr($line,140,10));
    $so=trim(substr($line,150,10));
    $sh=trim(substr($line,160,10));
    $sf=trim(substr($line,170,10));
    $gdp=trim(substr($line,180,10));
    $ibb=trim(substr($line,190,10));
    $ci=trim(substr($line,200,10));
    $eb=trim(substr($line,210,10));
    $outs=trim(substr($line,220,10));
    $lob=trim(substr($line,230,10));
    $or=trim(substr($line,240,10));
    $wins=trim(substr($line,250,10));
    
    $truepa=$ab+$bb+$hb+$sh+$sf+$ci;
    $hits=$sgl+$dbl+$tpl+$hr;
    $tb=$sgl+(2*$dbl)+(3*$tpl)+(4*$hr);


    // Calculate Runs:
/*

CODE PARTICULAR TO GIVEN EQUATION GOES HERE - TYPICALLY JUST A FEW LINES

ends with--

    $proj=  WHATEVER

*/
    $proj=round($proj);
    

    // Collect Accuracy Data:
    $error=$proj-$runs;
    $errorsize=abs($error);
    
    $cumr=$cumr+$runs;
    $cumerror=$cumerror+$error;  // +/- cancel
    $cumsize=$cumsize+$errorsize;
    $rawpct=100*($error/$runs);
    $cumpct=$cumpct+abs($rawpct);
    $cumerrsq=$cumerrsq+($errorsize*$errorsize);

    $rawpct2=round($rawpct,2);
    $pct=abs($rawpct2);
    $dot=strpos($pct,'.');
    if ($dot!==FALSE)
    {
      $whole=substr($pct,0,$dot);
      $frac=substr($pct,1+$dot);
     } else {
      $whole=$pct;
      $frac='00';
    }
    if (strlen($whole)==1) $whole=' '.$whole;
    if (strlen($frac)==1) $frac=$frac.'0';

    if ($rawpct==0)
    {
      $pct=' 0   ';
     } else {
      if ($rawpct>0)
      {
        $pct='+'.$whole.'.'.$frac;
       } else {
        $pct='-'.$whole.'.'.$frac;
      }
    }
    
    $line='      '.$season.'       '.$org.substr('         '.$runs,-10).substr('         '.$proj,-10).
            substr('         '.$error,-10).substr('         '.$pct,-10).$lf;
    if ($rawpct==0)
    {
      $zeros[$pct.' '.$season.$org]=$line;
     } else {
      if ($rawpct>0)
      {
        $pos[$pct.' '.$season.$org]=$line;
       } else {
        $negs[$pct.' '.$season.$org]=$line;
      }
    }
    
  }

  ksort($pos);
  krsort($negs);
  $outfile=array_merge($dummy,$negs,$zeros,$pos);

  $error=$cumerror/$teamseasons;
  $size=$cumsize/$teamseasons;
  $pct=$cumpct/$teamseasons;
  $cumerrsq=$cumerrsq/($teamseasons-1);
  $sd=sqrt($cumerrsq);
  
  $handle=fopen($title.'.calcs','wb');
  foreach($outfile as $line)
  {fwrite($handle,$line);}
  fwrite($handle,'-------------------------'.$lf);
  fwrite($handle,'Cumulative Error: '.$cumerror.$lf);
  fwrite($handle,'Per-TmYr Error: '.$error.$lf);
  fwrite($handle,'Average Error Size: '.$size.$lf);
  fwrite($handle,'Average Error Pct.: '.$pct.$lf);
  fwrite($handle,'Standard Deviation: '.$sd.$lf);
  fwrite($handle,'Negative: '.(    round(100*count($negs)/$teamseasons,1)).'%'.$lf);
  fwrite($handle,'Zero: '.(round(100*count($zeros)/$teamseasons,1)).'%'.$lf);
  fwrite($handle,'Positive: '.(round(100*count($pos)/$teamseasons,1)).'%'.$lf);
  fwrite($handle,'-------------------------'.$lf);
  fclose($handle);
  
  echo '==================================='.$lf;
  echo 'Error Pct. = '.$pct.$lf;
  echo 'Error Size = '.$size.$lf;
  echo $lf;  
  echo 'Done.'.$lf.$lf;

 
?>

Per-Game Code

<?php

  $title='PerGame';

  // General Constants:
  $lf=chr(10);
  $p=$lf.$lf;

  // Local Constants:
  //   top:
  $Kbbhb=1;
  $K1=2.38782;
  $K2=3.37;
  $K3=6.09;
  $K4=3.7704;
  $Ksb=1.52;
  $Ksh= -0.4859;
  $Kslope=0.499377343455;
  $Kb= -0.0600151086521;
  //   xr:
  // Local Constants:
  $xK1=0.50;
  $xK2=0.72;
  $xK3=1.04;
  $xK4=1.44;
  $xKsb=0.18;
  $xKsh=0.04;
  $xKsf=0.37;
  $xKibb=0.25;
  $xKcs=0.32;
  $xKso=0.098;
  $xKdp=0.37;
  $xKhb=0.34;
  $xKq=0.09;

 
  // Setup & Run:
  $main=file('game.data');
  $samples=(count($main))-1;
  $outfile=NULL;
  $sizetop=0;
  $cumtop=0;
  $sizebr=0;
  $cumbr=0;
  $sizexr=0;
  $cumxr=0;
  foreach ($main as $line)
  {
    $line=rtrim($line);
    if (strpos($line,'Date')!==FALSE)
    {
      $outfile[]='Date      Org  Vs. Runs  TOP  Err   BR  Err   XR  Err   better'.$lf;
      continue;
    }
    $build1=substr($line,0,18);  // start of output line
    $runs=trim(substr($line,18,5));
    $ab=trim(substr($line,23,5));
    $bb=trim(substr($line,28,5));
    $hb=trim(substr($line,33,5));
    $sh=trim(substr($line,38,5));
    $sf=trim(substr($line,43,5));
    $ci=trim(substr($line,48,5));
    $pa=trim(substr($line,53,5));
    $sgl=trim(substr($line,58,5));
    $dbl=trim(substr($line,63,5));
    $tpl=trim(substr($line,68,5));
    $hr=trim(substr($line,73,5));
    $hits=trim(substr($line,78,5));
    $tb=trim(substr($line,83,5));
    $ibb=trim(substr($line,88,5));
    $sb=trim(substr($line,93,5));
    $cs=trim(substr($line,98,5));
    $so=trim(substr($line,103,5));
    $gdp=trim(substr($line,108,5));
    $eb=trim(substr($line,113,5));
    $outs=trim(substr($line,118,5));

    $build=$build1.substr('    '.$runs,-5);
    

    // Calculate TOP:
    $rlob=$pa-$outs;
    $wtb=($K1*($sgl+$eb))+($K2*$dbl)+($K3*$tpl)+($K4*$hr);
    $bbhb=$bb+$hb+$ci;
    $factor=$wtb+($Kbbhb*$bbhb)+($Ksh*$sh)+($Ksb*$sb);
    $factor=$factor/$pa;
    $block=$factor*$Kslope;
    $multiplier=$block+$Kb;
    $proj=(($rlob-$hr)*$multiplier)+$hr;

    $proj0=round($proj);
    $terror0=$proj0-$runs;
    $build=$build.substr('    '.$proj0,-5).substr('    '.$terror0,-5);
    
    $sizetop=$sizetop+abs($terror0);
    $cumtop=$cumtop+$terror0;


    // Calculate BaseRuns:
    $parta=$hits+$bb+$hb-$hr-(0.5*$ibb);
    $partb=1.1*((1.4*$tb)-(0.6*$hits)-(3.0*$hr)+(0.1*($bb-$ibb+$hb))+(0.9*($sb-$cs-$gdp)));
    $partc=$ab-$hits+$cs+$gdp;
    $partd=$hr;
    $proj=($parta*($partb/($partb+$partc)))+$partd;

    $proj0=round($proj);
    $berror0=$proj0-$runs;
    $build1=$build1.substr('    '.$proj0,-5).substr('    '.$berror0,-5);
    
    $sizebr=$sizebr+abs($berror0);
    $cumbr=$cumbr+$berror0;


    // Calculate XR:
    $proj=($xK1*$sgl) 
          + ($xK2*$dbl)
          + ($xK3*$tpl)
          + ($xK4*$hr)
          + ($xKsb*$sb)
          + ($xKsh*$sh)
          + ($xKsf*$sf)
          + ($xKibb*$ibb)
          - ($xKcs*$cs)
          - ($xKso*$so)
          - ($xKdp*$gdp)
          + ($xKhb*($hb+$bb-$ibb))
          - ($xKq*($ab-$hits-$so));

    $proj0=round($proj);
    $xerror0=$proj0-$runs;
    $build1=$build1.substr('    '.$proj0,-5).substr('    '.$xerror0,-5);
    
    $sizexr=$sizexr+abs($xerror0);
    $cumxr=$cumxr+$xerror0;

    if (abs($berror0)<abs($terror0)) $build1=$build1.'   br';
    if (abs($berror0)<abs($terror0)) $build1=$build1.'   xr';
    $build1=$build1.$lf;
    
    $outfile[]=$build1;
    

  }
  
  $sizetop=$sizetop/$samples;
  $sizebr=$sizebr/$samples;
  $sizexr=$sizexr/$samples;
  

  $handle=fopen($title.'.calcs','wb');
  foreach($outfile as $line)
  {fwrite($handle,$line);}
  fclose($handle);

  echo $lf;  
  echo '==================================='.$lf;
  echo $lf;
  echo '  TOP Avg. Error Size: '.$sizetop.$lf;
  echo '       TOP Cum. Error: '.$cumtop.$lf;
  echo $lf;
  echo 'BaseR Avg. Error Size: '.$sizebr.$lf;
  echo '     BaseR Cum. Error: '.$cumbr.$lf;
  echo $lf;
  echo '   XR Avg. Error Size: '.$sizexr.$lf;
  echo '        XR Cum. Error: '.$cumxr.$lf;
  echo $lf;
  echo 'Done.'.$lf.$lf;

 
?>

Per-Inning Code

<?php

  $title='PerInning';

  // General Constants:
  $lf=chr(10);
  $p=$lf.$lf;

  // FUNCTIONS:
  function cleanup($proj)
  {
    //   Standardize Projection Precision For Display:
    $proj=round($proj,9);  // standardize decimal places
    $dot=strpos($proj,'.');  // get decimal-point location
    if ($dot===FALSE)
    {
      if ($proj==0)
      {
        $proj=' 0.000000000';  // blank instead of +/-
       } else {
        $proj=$proj.'.000000000';
      }
     } else {
      $whole=substr($proj,0,1+$dot);
      $frac=substr($proj,1+$dot);
      $frac=substr($frac.'000000000',0,9);  // insert trailing zeros as required
      $proj=$whole.$frac; // reassemble
    }
    if ($proj>0) $proj='+'.$proj;  // minus signs and zero-blanks already provided
    return $proj;
  }
  
  
  $datafile='InningData2000-2008.csv';
  $delimiter=',';

  // Local Constants:
  //   top:
  $Kbbhb=1;
  $K1=2.4118;
  $K4=3.7991;
  $K2=3.347;
  $K3= 6.076;
  $Ksb=1.5424;
  $Ksh= -0.454;
  $Kslope=0.488796403119;
  $Kb= -0.0547215599553;
  //   xr:
  $xK1=0.50;
  $xK2=0.72;
  $xK3=1.04;
  $xK4=1.44;
  $xKsb=0.18;
  $xKsh=0.04;
  $xKsf=0.37;
  $xKibb=0.25;
  $xKcs=0.32;
  $xKso=0.098;
  $xKdp=0.37;
  $xKhb=0.34;
  $xKq=0.09;
  //   br:
  $Kbr= 1.05069;
 
  // Initialize Accumulators:
  $sizetop0=0;
  $sizetop=0;
  $cumtop0=0;
  $cumtop=0;
  
  $sizebr0=0;
  $sizebr=0;
  $cumbr0=0;
  $cumbr=0;
  
  $sizexr0=0;
  $sizexr=0;
  $cumxr0=0;
  $cumxr=0;
  $counter=0;

  
  // Setup:
  $rhandle=fopen($datafile,'rb');
  if ($rhandle===FALSE)
  {
    echo 'Could not open input file for reading!'.$p;
    exit;
  }
  $whandle=fopen('PerInning','wb');
  if ($whandle===FALSE)
  {
    echo 'Could not open output file for writing!'.$p;
    exit;
  }
  $outfile=NULL;
  $label='     Inning ID  Runs       '.
            'TOP  Err   FullTOP             '.
            'BR  Err    FullBR             '.
            'XR  Err    FullXR';
  fwrite($whandle,$label.$lf);
  
  
  // Run:
  while (feof($whandle)!==TRUE) 
  {
    $line=trim(fgets($rhandle));
//    echo $lf.$line.$p;
    if (strpos($line,'"outs"')!==FALSE) continue;  // ignore header line
    
    $stats=explode($delimiter,$line);  // convert string to stats array
    $id=trim($stats[0],'"');  // inning id with double-quotes stripped off
    $ab=$stats[1];
    $ubb=$stats[2];  // all walks that are not intentional
    $ibb=$stats[3];
    $hb=$stats[4];
    $ci=$stats[5];
    $outs=$stats[6];
    $so=$stats[7];
    $sh=$stats[8];
    $sf=$stats[9];
    $sgl=$stats[10];
    $dbl=$stats[11];
    $tpl=$stats[12];
    $hr=$stats[13];
    $sb=$stats[14];
    $cs=$stats[15];
    $eb=$stats[16];
/*
Eb is "defined as any play where the batter reached base without getting a hit on a
play where there was at least 1 error, which isn't perfect, but probably close enough."
*/
    $runs=$stats[17];

    $hits=$sgl+$dbl+$tpl+$hr;
    $tb=$sgl+(2*$dbl)+(3*$tpl)+(4*$hr);
    $bb=$ubb+$ibb;
    $pa=$ab+$bb+$hb+$sh+$sf+$ci;
if ($pa==0) break;  // since feof seems not reliable

/*
// Show 1st data line, for test:
echo 'id: '.$id.$lf;
echo 'ab: '.$ab.$lf;
echo 'ubb: '.$ubb.$lf;
echo 'ibb: '.$ibb.$lf;
echo 'bb: '.$bb.$lf;
echo 'hb: '.$hb.$lf;
echo 'ci: '.$ci.$lf;
echo 'outs: '.$outs.$lf;
echo 'so: '.$so.$lf;
echo 'sh: '.$sh.$lf;
echo 'sf: '.$sf.$lf;
echo 'sgl: '.$sgl.$lf;
echo 'dbl: '.$dbl.$lf;
echo 'tpl: '.$tpl.$lf;
echo 'hr: '.$hr.$lf;
echo 'sb: '.$sb.$lf;
echo 'cs: '.$cs.$lf;
echo 'eb: '.$eb.$lf;
echo 'runs: '.$runs.$lf;
echo 'pa: '.$pa.$lf;
fclose($rhandle);
fclose($whandle);
exit;
*/
    $build=$id.substr('    '.$runs,-5);  // start of output line

    // TOP:

    //   Calculate TOP:
    $rlob=$pa-$outs;
    $wtb=($K1*($sgl+$eb))+($K2*$dbl)+($K3*$tpl)+($K4*$hr);
    $bbhb=$bb+$hb+$ci;
    $factor=$wtb+($Kbbhb*$bbhb)+($Ksh*$sh)+($Ksb*$sb);
    $factor=$factor/$pa;
    $block=$factor*$Kslope;
    $multiplier=$block+$Kb;
    $proj=(($rlob-$hr)*$multiplier)+$hr;  // true result as fraction

    //   Figure Errors:
    //     figure:
    $error=$proj-$runs;    // error of exact prediction (non-integer values allowed)
    $proj0=round($proj);   // results rounded to nearest whole integer
    $error0=$proj0-$runs;  // error of actual (integer) prediction
    //     record:
    $sizetop0=$sizetop0+abs($error0);  // size of actual (integer) prediction error
    $sizetop=$sizetop+abs($error);    // size of exact (non-integer) error
    $cumtop0=$cumtop0+$error0;       // cumulating signed error of actual (integer) prediction
    $cumtop=$cumtop+$error;         // cumulating signed error of exact (non-integer) prediction

    //   Standardize Projection Precision For Display:
    $proj=cleanup($proj);
    $build=$build.substr('         '.$proj0,-10).substr('    '.$error0,-5).'   '.$proj;


    // BaseRuns:

    //   Calculate BaseRuns:
    $parta=$hits+$bb+$hb-$hr-(0.5*$ibb);
    $partb=$Kbr*((1.4*$tb)-(0.6*$hits)-(3.0*$hr)+(0.1*($bb-$ibb+$hb))+(0.9*($sb-$cs-$gdp)));
    $partc=$ab-$hits+$cs+$gdp;
    $partd=$hr;
    $proj=($parta*($partb/($partb+$partc)))+$partd;

    //   Figure Errors:
    //     figure:
    $error=$proj-$runs;    // error of exact prediction (non-integer values allowed)
    $proj0=round($proj);   // results rounded to nearest whole integer
    $error0=$proj0-$runs;  // error of actual (integer) prediction
    //     record:
    $sizebr0=$sizebr0+abs($error0);  // size of actual (integer) prediction error
    $sizebr=$sizebr+abs($error);    // size of exact (non-integer) error
    $cumbr0=$cumbr0+$error0;       // cumulating signed error of actual (integer) prediction
    $cumbr=$cumbr+$error;         // cumulating signed error of exact (non-integer) prediction

    //   Standardize Projection Precision For Display:
    $proj=cleanup($proj);
    $build=$build.substr('         '.$proj0,-10).substr('    '.$error0,-5).'   '.$proj;


    // Extrapolated Runs:

    //   Calculate XR:
    $proj=($xK1*$sgl) 
          + ($xK2*$dbl)
          + ($xK3*$tpl)
          + ($xK4*$hr)
          + ($xKsb*$sb)
          + ($xKsh*$sh)
          + ($xKsf*$sf)
          + ($xKibb*$ibb)
          - ($xKcs*$cs)
          - ($xKso*$so)
          - ($xKdp*$gdp)
          + ($xKhb*($hb+$bb-$ibb))
          - ($xKq*($ab-$hits-$so));

    //   Figure Errors:
    //     figure:
    $error=$proj-$runs;    // error of exact prediction (non-integer values allowed)
    $proj0=round($proj);   // results rounded to nearest whole integer
    $error0=$proj0-$runs;  // error of actual (integer) prediction
    //     record:
    $sizexr0=$sizexr0+abs($error0);  // size of actual (integer) prediction error
    $sizexr=$sizexr+abs($error);    // size of exact (non-integer) error
    $cumxr0=$cumxr0+$error0;       // cumulating signed error of actual (integer) prediction
    $cumxr=$cumxr+$error;         // cumulating signed error of exact (non-integer) prediction

    //   Standardize Projection Precision For Display:
    $proj=cleanup($proj);
    $build=$build.substr('         '.$proj0,-10).substr('    '.$error0,-5).'   '.$proj;


    // Write This File Line:
    $build=$build.$lf;    
    fwrite($whandle,$build);
    $counter=$counter+1;
    if (($counter%10000)==0) echo substr('   '.$counter,-6).$lf;  // visual progress display

    $pa=0;  // pre-set, to detect if file finished

// if ($counter>9) break;  // for testing

  }
  
  // Close Open Files:
  fclose($rhandle);
  
  // Calculate Per-Inning Averages:
  $t0avg=$sizetop0/$counter;
  $b0avg=$sizebr0/$counter;
  $x0avg=$sizexr0/$counter;
  $tavg=$sizetop/$counter;
  $bavg=$sizebr/$counter;
  $xavg=$sizexr/$counter;
  
  // Create Summary:
  $summary=NULL;
  $summary[]='==================================='.$lf;
  $summary[]='         Innings: '.$counter.$lf;
  $summary[]=$lf;
  $summary[]='TOP:'.$lf;
  $summary[]='----'.$lf;
  $summary[]='   Average Error: '.$t0avg.$lf;
  $summary[]='Cumulative Error: '.$cumtop0.$lf;
  $summary[]='Exact Cum. Error: '.$cumtop.$lf;
  $summary[]=$lf;
  $summary[]='BR:'.$lf;
  $summary[]='---'.$lf;
  $summary[]='   Average Error: '.$b0avg.$lf;
  $summary[]='Cumulative Error: '.$cumbr0.$lf;
  $summary[]='Exact Cum. Error: '.$cumbr.$lf;
  $summary[]=$lf;
  $summary[]='XR'.$lf;
  $summary[]='----'.$lf;
  $summary[]='   Average Error: '.$x0avg.$lf;
  $summary[]='Cumulative Error: '.$cumxr0.$lf;
  $summary[]='Exact Cum. Error: '.$cumxr.$lf;
  $summary[]=$lf;

  // Save & Display Results:
  foreach ($summary as $line)
  {
    fwrite($whandle,$line);
    echo $line;
  }
  fclose($whandle);
  
  echo 'Done.'.$p;

 
?>



Fatal error: Function name must be a string in /usr/www/users/owlcroft/highboskage/formula-notes.php on line 1285