|
The High Boskage House Baseball-Analysis Web Site baseball team and player performance examined realistically and accurately |
||
| email me | search site | site directory |
Run-Scoring FormulaeTeam-Season-Level ResultsFrom work discussed elsewhere, the chief competitors--if we want to put it that way--to the TOP formula are "Extrapolated Runs" (denoted XR), a linear-weights formulation, and a variant of the so-called "Technical Runs Created" equation, which, like the TOP, is multiplicative in nature. But, for various reasons, instead of the Runs-Created variant, we will here consider "Base Runs" (denoted BR), in that they are held by some to best model actual run creation. Here, by the way, is a summary of the team-season results: Team-Seasons: 1138
TOP Average Error Pct.: 2.32054013159
XR Average Error Pct.: 2.53012140594
BR Average Error Pct.: 2.68385581113
TOP Average Error Size: 16.0279001468
XR Average Error Size: 17.4948604993 +1.46696030 runs
BR Average Error Size: 18.6292217327 +2.60132158 runs
Team-Game-Level ResultsAbout the SampleElsewhere the overall accuracy of these methods has been demonstrated as applied to 1138 team-seasons. But skeptics assert that what is accurate for 1138 amalgamated sets of something from 1386 to 1458 innings each set can falsely make an inaccurate formula seem accurate; the acid test, they say, is the accuracy of equations at much smaller scales--the game or, ideally, the inning. Let's begin by looking at a sample of some individual games. Because I need to transcribe per-game stat lines from game box scores, there is a small but definitely non-zero amount of time and effort for each one; I thus limited myself for now to two dozen such lines (from a dozen games); the dozen games are because 55 years makes a dozen 5-year intervals, which seemed a satisfactory distribution. I went to some pains to make the sample as scattered and random as possible: one game every five seasons, starting with 1955 (that is, from 1955, 1960, 1965, etc.); each game was taken from the league other than the previous game; the first game was taken at the start of its season, with succeeding games were taken about five years and two weeks later, so that the game dates progress through the season; and for each approximate date thus arrived at, a pair of teams was chosen neither of which had yet been used. The sample includes games with zero runs scored, and games with 12 runs scored, and a fair spread in between. Whether 24 games lines is a sample large enough to be meaningful (not probative, of course, but at least suggestive) is a matter that can be calculated to a nicety, but franjly I couldn't be bothered. When the A. C. Nielsen Company is paid substantial monies to determine the television-watching tastes of 300 million Americans from those of a few hundred families (not to speak of the work of folk like the Gallup and Rasmussen organizations), I think the principle is clear. Anyone who disagrees is warmly invited to produce more data and evaluations. Actual ResultsThe results are shown directly below. Following them, the methodology is presented (the exact formulae used, with numerical coefficients; the particular games and their exact data; and even the actual PHP code used for the reckonings).
Summary
While TOP certainly looks nontrivially better at this level, we cannot forget the sample size. But it does seem safe to say that there is no suggestion of any one of these equations being far out of line--either as a success or as a failure--compared to the others. In the great descent from aggregations of from 1386 to 1458 innings (more or less) to aggregations of a mere 8, 9, and 10 innings, the relative strength of TOP for predicting runs from the raw data is unaltered: it remains clearly the best. The next step is to try these methods on a reasonable number of individual innings. Half-Inning-Level ResultsAbout the SampleOwing to the great kindness of Adam Dorhauer, I now have available a database of per-inning stat lines. It covers all innings (technically half-innings) played from 2000 through 2008, inclusive. The accompanying email contained the caveat that "I think these stats are accurate, but since they are from my own query of the Retrosheet PBP files, there could be an error that I missed in my code, so I can't guarantee they are as accurate as something that's been rigourously checked, like the Retrosheet files themselves or other published databases." I have, however, every confidence in their accuracy. Now while 389,042 innings may seem a lot, and in many ways is, we need to say a word about "tuning" of these various equations. The XR equation I have not attempted to "tune" for best results over any period because it is not obvious how its maker would have gone about such tuning. The TOP equation contains coefficients that were initially "tuned" to get minimum error over the 55-year period they encompass (and I would have used more years were commensurable stats available). The BR equation is intended, according to its maker, David Smyth, to be "tuned" by the value of one coefficient: "If you want to tailor a version to a particular dataset (such as 1993-2004, or the 1975 AL), all you have to do is determine the overall B multiplier." The team-season results above were so tuned, and so also were the individual-inning results (the nominal coefficient is 1.1; the 55-team-season value is 1.0889, the 8-year innings value 1.05069. The TOP coefficients for the per-inning work were also re-tuned (for details, see the Methodologies section farther below). My own opinion is that there is a fair chance that using an individual-innings database that covers all or most of the 55-year period would yield slightly worse results for BR, in that it seems slightly more sensitive to "tuning" than the others (which is to say, more sensitive to such average interactions of play as change over the decades in baseball--remember that we have, in the 55-year period, three distinct baseballs, that of 1955 to 1976, that of 1977 to 1992, and that of 1994 to 2009, with 1993 a "transition" season, and that doesn't even touch on the effects of the DH and other such phenomena). Obliged to average across a long period with greater variances in thise interactions, there might well be results at least somewhat different. If anyone has a database of inning stats other than or larger than 2000 - 2008, I'd love a copy. Actual ResultsAll that said, here are the actual results:
In short, BaseRuns was, over those 8 seasons, about .009 of a run more accurate than TOP, and about .029 of a run better than XR. All of these calculations bear out the original contention that most or all of the major run-production estimation formulae work about equally well. MethodologiesThe Actual FormulaeSo that there can be no confusion over what we are speaking of, here are the exact equations: TOP:
PA = AB + BB + HB + SH + SF + CI
NetOB = PA - Outs
WeightedHits = (K1 x [1B + Eb]) + (K2 x 2B) + (K3 x 3B) + (K4 x HR)
FreePasses = BB + HB + CI
GrossAdvanceFactor = WeightedHits + (Kbbhb x FreePasses) + (Ksh x SH) + (Ksb x SB)
AdvanceFactor = GrossAdvanceFactor / PA
Block = AdvanceFactor x Kslope
Multiplier = Block + Kb
Rtop = ([NetOB - HR] x Multiplier) + HR
where, for 55 years (1955 - 2009) of team-season data:
Kbbhb=1;
K1=2.38782
K2=3.37
K3=6.09
K4=3.7704
Ksb=1.52
Ksh= -0.4859
Kslope=0.499377343455
Kb= -0.0600151086521
and where, for 9 years (2000 - 2008) of inning data:
Kbbhb=1;
K1=2.4118
K2=3.347
K3=6.076
K4=3.7991
Ksb=1.5424
Ksh= -0.454
Kslope=0.488796403119
Kb= -0.0547215599553
BR:
PartA = H + BB + HB - HR - (0.5 x IBB)
PartB = ([1.4 x TB] - [0.6 x H] - [3.0 x HR] + [0.1 x {BB - IBB + HB}] + [0.9 x {SB - CS - GDP}]) x Kbr
PartC = AB - H + CS + GDP
PartD = HR
Rbr = (PartA x [PartB / {PartB + PartC}]) + PartD
where, for 55 years (1955 - 2009) of team-season data:
Kbr=1.0889
and where, for 8 years (2000 - 2008) of inning data:
Kbr=1.05069
XR:
Rxr = (0.50 × 1B) +
(0.72 × 2B) +
(1.04 × 3B) +
(1.44 × HR) +
(0.18 × SB) +
(0.04 × SH) +
(0.37 x SF) +
(0.25 × IBB) -
(0.32 × CS) -
(0.098 x SO) -
(0.37 × GDP) +
(0.34 × [HP + TBB − IBB]) -
(0.090 × [AB − H − K])
Raw Game DataBecause errors in tracscribing box-score data are possible, here--for those who would cross-check or spot-check--are the data used: Date Org Vs. R AB BB HB SH SF CI PA 1B 2B 3B HR H TB IBB SB CS SO GDP Eb Outs 19550411 BAL wsh 5 32 1 2 1 0 0 36 3 2 1 0 6 10 0 0 0 3 1 2 27 19550411 WSH bal 12 33 5 2 0 2 0 42 6 4 0 0 10 14 0 1 0 4 0 1 24 19600419 CHC stl 2 32 4 0 1 0 0 37 6 0 0 1 7 10 1 0 0 11 1 1 27 19600419 STL chc 5 31 6 0 1 0 0 38 8 2 0 1 11 16 1 0 1 2 1 0 24 19650504 CHW det 10 38 5 1 1 1 0 46 10 2 1 0 13 17 0 0 1 7 1 5 27 19650504 DET chw 6 36 4 1 0 0 0 41 7 1 0 1 9 13 0 0 0 12 2 2 27 19700516 ATL cin 0 31 3 0 1 0 0 35 3 2 0 0 5 7 0 0 0 3 0 0 27 19700516 CIN atl 2 32 3 0 0 0 0 35 7 2 0 1 10 15 0 3 0 9 1 0 24 19750601 BOS min 11 38 3 0 0 1 0 42 6 2 0 4 12 26 0 0 0 5 0 0 27 19750601 MIN bos 9 39 10 0 0 0 0 49 10 0 1 1 12 17 0 1 0 5 1 3 27 19800615 HOU pit 1 32 3 0 1 0 0 36 7 1 0 0 8 9 0 1 0 3 1 0 27 19800615 PIT hou 4 31 0 0 0 0 0 31 7 1 0 1 9 13 0 1 1 3 1 0 24 19850701 CAL tex 5 34 2 0 0 1 0 37 10 1 0 1 12 16 0 0 0 2 2 2 27 19850701 TEX cal 10 33 4 0 1 0 0 38 7 3 1 1 12 20 0 0 0 4 2 2 24 19900723 NYM phi 4 35 2 0 1 0 0 38 5 2 0 1 8 13 1 1 0 7 0 1 27 19900723 PHI nym 7 30 8 0 1 0 0 39 5 0 1 0 6 8 1 2 0 8 0 1 24 19950730 CLE sea 5 34 5 0 0 0 0 39 5 0 0 2 7 13 0 1 0 7 1 1 27 19950731 SEA cle 2 31 1 0 0 0 0 32 2 1 0 2 5 12 0 0 0 6 0 0 27 20000816 FLA lad 4 35 4 0 0 0 0 39 11 1 0 1 13 17 0 0 1 8 2 0 27 20000816 LAD fla 10 34 4 1 1 1 0 41 8 1 1 1 11 17 0 2 1 6 0 1 27 20050904 NYY oak 7 39 9 1 0 0 0 49 9 1 1 2 13 22 0 2 0 8 1 0 27 20050904 OAK nyy 3 34 4 0 0 0 0 38 5 1 0 1 7 11 0 0 1 4 1 2 27 20091004 SDP sfo 3 34 5 1 1 0 0 41 4 0 0 1 5 8 1 0 1 8 1 1 30 20091004 SFO sdp 4 39 3 0 0 0 0 42 10 1 0 1 12 16 1 3 0 14 2 0 30 The datum Eb is supposed to be opponents' errors resulting in an otherwise-out batter safely reaching base. But that datum, though required of the Official Scorer for every game, is not (that I can find) published in individual game results (though Baseball-Reference.org has it for seasonal results). In the transcribing, the total of all opponents' errors was used, which may introduce an occasional rather small bias, which has been neglected here. The datum Outs is simply opponents' innings pitched times 3. It is unclear whether CI (catcher's interference) is recorded in box scores, but all those used did balance up (PA = R + LOB + Outs). Actual CodeJust so that everything whatever is open and aboveboard, shown below are the actual PHP code snippets used to make the calculations: Team-Seasons CodeWhile the exact code varies from equation to equation, most of it is the same: <?php
$title='BASELINE'; // substitute identifier for particular equation
// General Constants:
$lf=chr(10);
$crlf=chr(13).$lf;
$br='<br/>'.$lf;
$p='<br/>'.$br;
// Accuracy Counters:
$teamseasons=0;
$cumerror=0;
$cumsize=0;
$cumpct=0;
$cumerrsq=0;
// Local Constants:
/*
INSERT REQUIRED CONSTANTS FOR PARTICULAR EQUATION BEING EVALUATED
*/
// Setup & Run:
$main=file('FullBat.ByTeam');
$dummy=NULL;
$pos=NULL;
$negs=NULL;
$zeros=NULL;
foreach ($main as $line)
{
$line=rtrim($line);
if (strpos($line,'Season')!==FALSE)
{
$dummy[]=' Season Org Runs Proj Err Pct'.$lf;
continue;
}
$season=trim(substr($line,0,10));
if ($season=='1954') continue; // for just comparison with others...
$teamseasons=$teamseasons+1;
$org=trim(substr($line,10,10));
$league=trim(substr($line,20,10));
$games=trim(substr($line,30,10));
$runs=trim(substr($line,40,10));
$pa=trim(substr($line,50,10));
$ab=trim(substr($line,60,10));
$sgl=trim(substr($line,70,10));
$dbl=trim(substr($line,80,10));
$tpl=trim(substr($line,90,10));
$hr=trim(substr($line,100,10));
$bb=trim(substr($line,110,10));
$hb=trim(substr($line,120,10));
$sb=trim(substr($line,130,10));
$cs=trim(substr($line,140,10));
$so=trim(substr($line,150,10));
$sh=trim(substr($line,160,10));
$sf=trim(substr($line,170,10));
$gdp=trim(substr($line,180,10));
$ibb=trim(substr($line,190,10));
$ci=trim(substr($line,200,10));
$eb=trim(substr($line,210,10));
$outs=trim(substr($line,220,10));
$lob=trim(substr($line,230,10));
$or=trim(substr($line,240,10));
$wins=trim(substr($line,250,10));
$truepa=$ab+$bb+$hb+$sh+$sf+$ci;
$hits=$sgl+$dbl+$tpl+$hr;
$tb=$sgl+(2*$dbl)+(3*$tpl)+(4*$hr);
// Calculate Runs:
/*
CODE PARTICULAR TO GIVEN EQUATION GOES HERE - TYPICALLY JUST A FEW LINES
ends with--
$proj= WHATEVER
*/
$proj=round($proj);
// Collect Accuracy Data:
$error=$proj-$runs;
$errorsize=abs($error);
$cumr=$cumr+$runs;
$cumerror=$cumerror+$error; // +/- cancel
$cumsize=$cumsize+$errorsize;
$rawpct=100*($error/$runs);
$cumpct=$cumpct+abs($rawpct);
$cumerrsq=$cumerrsq+($errorsize*$errorsize);
$rawpct2=round($rawpct,2);
$pct=abs($rawpct2);
$dot=strpos($pct,'.');
if ($dot!==FALSE)
{
$whole=substr($pct,0,$dot);
$frac=substr($pct,1+$dot);
} else {
$whole=$pct;
$frac='00';
}
if (strlen($whole)==1) $whole=' '.$whole;
if (strlen($frac)==1) $frac=$frac.'0';
if ($rawpct==0)
{
$pct=' 0 ';
} else {
if ($rawpct>0)
{
$pct='+'.$whole.'.'.$frac;
} else {
$pct='-'.$whole.'.'.$frac;
}
}
$line=' '.$season.' '.$org.substr(' '.$runs,-10).substr(' '.$proj,-10).
substr(' '.$error,-10).substr(' '.$pct,-10).$lf;
if ($rawpct==0)
{
$zeros[$pct.' '.$season.$org]=$line;
} else {
if ($rawpct>0)
{
$pos[$pct.' '.$season.$org]=$line;
} else {
$negs[$pct.' '.$season.$org]=$line;
}
}
}
ksort($pos);
krsort($negs);
$outfile=array_merge($dummy,$negs,$zeros,$pos);
$error=$cumerror/$teamseasons;
$size=$cumsize/$teamseasons;
$pct=$cumpct/$teamseasons;
$cumerrsq=$cumerrsq/($teamseasons-1);
$sd=sqrt($cumerrsq);
$handle=fopen($title.'.calcs','wb');
foreach($outfile as $line)
{fwrite($handle,$line);}
fwrite($handle,'-------------------------'.$lf);
fwrite($handle,'Cumulative Error: '.$cumerror.$lf);
fwrite($handle,'Per-TmYr Error: '.$error.$lf);
fwrite($handle,'Average Error Size: '.$size.$lf);
fwrite($handle,'Average Error Pct.: '.$pct.$lf);
fwrite($handle,'Standard Deviation: '.$sd.$lf);
fwrite($handle,'Negative: '.( round(100*count($negs)/$teamseasons,1)).'%'.$lf);
fwrite($handle,'Zero: '.(round(100*count($zeros)/$teamseasons,1)).'%'.$lf);
fwrite($handle,'Positive: '.(round(100*count($pos)/$teamseasons,1)).'%'.$lf);
fwrite($handle,'-------------------------'.$lf);
fclose($handle);
echo '==================================='.$lf;
echo 'Error Pct. = '.$pct.$lf;
echo 'Error Size = '.$size.$lf;
echo $lf;
echo 'Done.'.$lf.$lf;
?>
Per-Game Code<?php
$title='PerGame';
// General Constants:
$lf=chr(10);
$p=$lf.$lf;
// Local Constants:
// top:
$Kbbhb=1;
$K1=2.38782;
$K2=3.37;
$K3=6.09;
$K4=3.7704;
$Ksb=1.52;
$Ksh= -0.4859;
$Kslope=0.499377343455;
$Kb= -0.0600151086521;
// xr:
// Local Constants:
$xK1=0.50;
$xK2=0.72;
$xK3=1.04;
$xK4=1.44;
$xKsb=0.18;
$xKsh=0.04;
$xKsf=0.37;
$xKibb=0.25;
$xKcs=0.32;
$xKso=0.098;
$xKdp=0.37;
$xKhb=0.34;
$xKq=0.09;
// Setup & Run:
$main=file('game.data');
$samples=(count($main))-1;
$outfile=NULL;
$sizetop=0;
$cumtop=0;
$sizebr=0;
$cumbr=0;
$sizexr=0;
$cumxr=0;
foreach ($main as $line)
{
$line=rtrim($line);
if (strpos($line,'Date')!==FALSE)
{
$outfile[]='Date Org Vs. Runs TOP Err BR Err XR Err better'.$lf;
continue;
}
$build1=substr($line,0,18); // start of output line
$runs=trim(substr($line,18,5));
$ab=trim(substr($line,23,5));
$bb=trim(substr($line,28,5));
$hb=trim(substr($line,33,5));
$sh=trim(substr($line,38,5));
$sf=trim(substr($line,43,5));
$ci=trim(substr($line,48,5));
$pa=trim(substr($line,53,5));
$sgl=trim(substr($line,58,5));
$dbl=trim(substr($line,63,5));
$tpl=trim(substr($line,68,5));
$hr=trim(substr($line,73,5));
$hits=trim(substr($line,78,5));
$tb=trim(substr($line,83,5));
$ibb=trim(substr($line,88,5));
$sb=trim(substr($line,93,5));
$cs=trim(substr($line,98,5));
$so=trim(substr($line,103,5));
$gdp=trim(substr($line,108,5));
$eb=trim(substr($line,113,5));
$outs=trim(substr($line,118,5));
$build=$build1.substr(' '.$runs,-5);
// Calculate TOP:
$rlob=$pa-$outs;
$wtb=($K1*($sgl+$eb))+($K2*$dbl)+($K3*$tpl)+($K4*$hr);
$bbhb=$bb+$hb+$ci;
$factor=$wtb+($Kbbhb*$bbhb)+($Ksh*$sh)+($Ksb*$sb);
$factor=$factor/$pa;
$block=$factor*$Kslope;
$multiplier=$block+$Kb;
$proj=(($rlob-$hr)*$multiplier)+$hr;
$proj0=round($proj);
$terror0=$proj0-$runs;
$build=$build.substr(' '.$proj0,-5).substr(' '.$terror0,-5);
$sizetop=$sizetop+abs($terror0);
$cumtop=$cumtop+$terror0;
// Calculate BaseRuns:
$parta=$hits+$bb+$hb-$hr-(0.5*$ibb);
$partb=1.1*((1.4*$tb)-(0.6*$hits)-(3.0*$hr)+(0.1*($bb-$ibb+$hb))+(0.9*($sb-$cs-$gdp)));
$partc=$ab-$hits+$cs+$gdp;
$partd=$hr;
$proj=($parta*($partb/($partb+$partc)))+$partd;
$proj0=round($proj);
$berror0=$proj0-$runs;
$build1=$build1.substr(' '.$proj0,-5).substr(' '.$berror0,-5);
$sizebr=$sizebr+abs($berror0);
$cumbr=$cumbr+$berror0;
// Calculate XR:
$proj=($xK1*$sgl)
+ ($xK2*$dbl)
+ ($xK3*$tpl)
+ ($xK4*$hr)
+ ($xKsb*$sb)
+ ($xKsh*$sh)
+ ($xKsf*$sf)
+ ($xKibb*$ibb)
- ($xKcs*$cs)
- ($xKso*$so)
- ($xKdp*$gdp)
+ ($xKhb*($hb+$bb-$ibb))
- ($xKq*($ab-$hits-$so));
$proj0=round($proj);
$xerror0=$proj0-$runs;
$build1=$build1.substr(' '.$proj0,-5).substr(' '.$xerror0,-5);
$sizexr=$sizexr+abs($xerror0);
$cumxr=$cumxr+$xerror0;
if (abs($berror0)<abs($terror0)) $build1=$build1.' br';
if (abs($berror0)<abs($terror0)) $build1=$build1.' xr';
$build1=$build1.$lf;
$outfile[]=$build1;
}
$sizetop=$sizetop/$samples;
$sizebr=$sizebr/$samples;
$sizexr=$sizexr/$samples;
$handle=fopen($title.'.calcs','wb');
foreach($outfile as $line)
{fwrite($handle,$line);}
fclose($handle);
echo $lf;
echo '==================================='.$lf;
echo $lf;
echo ' TOP Avg. Error Size: '.$sizetop.$lf;
echo ' TOP Cum. Error: '.$cumtop.$lf;
echo $lf;
echo 'BaseR Avg. Error Size: '.$sizebr.$lf;
echo ' BaseR Cum. Error: '.$cumbr.$lf;
echo $lf;
echo ' XR Avg. Error Size: '.$sizexr.$lf;
echo ' XR Cum. Error: '.$cumxr.$lf;
echo $lf;
echo 'Done.'.$lf.$lf;
?>
Per-Inning Code<?php
$title='PerInning';
// General Constants:
$lf=chr(10);
$p=$lf.$lf;
// FUNCTIONS:
function cleanup($proj)
{
// Standardize Projection Precision For Display:
$proj=round($proj,9); // standardize decimal places
$dot=strpos($proj,'.'); // get decimal-point location
if ($dot===FALSE)
{
if ($proj==0)
{
$proj=' 0.000000000'; // blank instead of +/-
} else {
$proj=$proj.'.000000000';
}
} else {
$whole=substr($proj,0,1+$dot);
$frac=substr($proj,1+$dot);
$frac=substr($frac.'000000000',0,9); // insert trailing zeros as required
$proj=$whole.$frac; // reassemble
}
if ($proj>0) $proj='+'.$proj; // minus signs and zero-blanks already provided
return $proj;
}
$datafile='InningData2000-2008.csv';
$delimiter=',';
// Local Constants:
// top:
$Kbbhb=1;
$K1=2.4118;
$K4=3.7991;
$K2=3.347;
$K3= 6.076;
$Ksb=1.5424;
$Ksh= -0.454;
$Kslope=0.488796403119;
$Kb= -0.0547215599553;
// xr:
$xK1=0.50;
$xK2=0.72;
$xK3=1.04;
$xK4=1.44;
$xKsb=0.18;
$xKsh=0.04;
$xKsf=0.37;
$xKibb=0.25;
$xKcs=0.32;
$xKso=0.098;
$xKdp=0.37;
$xKhb=0.34;
$xKq=0.09;
// br:
$Kbr= 1.05069;
// Initialize Accumulators:
$sizetop0=0;
$sizetop=0;
$cumtop0=0;
$cumtop=0;
$sizebr0=0;
$sizebr=0;
$cumbr0=0;
$cumbr=0;
$sizexr0=0;
$sizexr=0;
$cumxr0=0;
$cumxr=0;
$counter=0;
// Setup:
$rhandle=fopen($datafile,'rb');
if ($rhandle===FALSE)
{
echo 'Could not open input file for reading!'.$p;
exit;
}
$whandle=fopen('PerInning','wb');
if ($whandle===FALSE)
{
echo 'Could not open output file for writing!'.$p;
exit;
}
$outfile=NULL;
$label=' Inning ID Runs '.
'TOP Err FullTOP '.
'BR Err FullBR '.
'XR Err FullXR';
fwrite($whandle,$label.$lf);
// Run:
while (feof($whandle)!==TRUE)
{
$line=trim(fgets($rhandle));
// echo $lf.$line.$p;
if (strpos($line,'"outs"')!==FALSE) continue; // ignore header line
$stats=explode($delimiter,$line); // convert string to stats array
$id=trim($stats[0],'"'); // inning id with double-quotes stripped off
$ab=$stats[1];
$ubb=$stats[2]; // all walks that are not intentional
$ibb=$stats[3];
$hb=$stats[4];
$ci=$stats[5];
$outs=$stats[6];
$so=$stats[7];
$sh=$stats[8];
$sf=$stats[9];
$sgl=$stats[10];
$dbl=$stats[11];
$tpl=$stats[12];
$hr=$stats[13];
$sb=$stats[14];
$cs=$stats[15];
$eb=$stats[16];
/*
Eb is "defined as any play where the batter reached base without getting a hit on a
play where there was at least 1 error, which isn't perfect, but probably close enough."
*/
$runs=$stats[17];
$hits=$sgl+$dbl+$tpl+$hr;
$tb=$sgl+(2*$dbl)+(3*$tpl)+(4*$hr);
$bb=$ubb+$ibb;
$pa=$ab+$bb+$hb+$sh+$sf+$ci;
if ($pa==0) break; // since feof seems not reliable
/*
// Show 1st data line, for test:
echo 'id: '.$id.$lf;
echo 'ab: '.$ab.$lf;
echo 'ubb: '.$ubb.$lf;
echo 'ibb: '.$ibb.$lf;
echo 'bb: '.$bb.$lf;
echo 'hb: '.$hb.$lf;
echo 'ci: '.$ci.$lf;
echo 'outs: '.$outs.$lf;
echo 'so: '.$so.$lf;
echo 'sh: '.$sh.$lf;
echo 'sf: '.$sf.$lf;
echo 'sgl: '.$sgl.$lf;
echo 'dbl: '.$dbl.$lf;
echo 'tpl: '.$tpl.$lf;
echo 'hr: '.$hr.$lf;
echo 'sb: '.$sb.$lf;
echo 'cs: '.$cs.$lf;
echo 'eb: '.$eb.$lf;
echo 'runs: '.$runs.$lf;
echo 'pa: '.$pa.$lf;
fclose($rhandle);
fclose($whandle);
exit;
*/
$build=$id.substr(' '.$runs,-5); // start of output line
// TOP:
// Calculate TOP:
$rlob=$pa-$outs;
$wtb=($K1*($sgl+$eb))+($K2*$dbl)+($K3*$tpl)+($K4*$hr);
$bbhb=$bb+$hb+$ci;
$factor=$wtb+($Kbbhb*$bbhb)+($Ksh*$sh)+($Ksb*$sb);
$factor=$factor/$pa;
$block=$factor*$Kslope;
$multiplier=$block+$Kb;
$proj=(($rlob-$hr)*$multiplier)+$hr; // true result as fraction
// Figure Errors:
// figure:
$error=$proj-$runs; // error of exact prediction (non-integer values allowed)
$proj0=round($proj); // results rounded to nearest whole integer
$error0=$proj0-$runs; // error of actual (integer) prediction
// record:
$sizetop0=$sizetop0+abs($error0); // size of actual (integer) prediction error
$sizetop=$sizetop+abs($error); // size of exact (non-integer) error
$cumtop0=$cumtop0+$error0; // cumulating signed error of actual (integer) prediction
$cumtop=$cumtop+$error; // cumulating signed error of exact (non-integer) prediction
// Standardize Projection Precision For Display:
$proj=cleanup($proj);
$build=$build.substr(' '.$proj0,-10).substr(' '.$error0,-5).' '.$proj;
// BaseRuns:
// Calculate BaseRuns:
$parta=$hits+$bb+$hb-$hr-(0.5*$ibb);
$partb=$Kbr*((1.4*$tb)-(0.6*$hits)-(3.0*$hr)+(0.1*($bb-$ibb+$hb))+(0.9*($sb-$cs-$gdp)));
$partc=$ab-$hits+$cs+$gdp;
$partd=$hr;
$proj=($parta*($partb/($partb+$partc)))+$partd;
// Figure Errors:
// figure:
$error=$proj-$runs; // error of exact prediction (non-integer values allowed)
$proj0=round($proj); // results rounded to nearest whole integer
$error0=$proj0-$runs; // error of actual (integer) prediction
// record:
$sizebr0=$sizebr0+abs($error0); // size of actual (integer) prediction error
$sizebr=$sizebr+abs($error); // size of exact (non-integer) error
$cumbr0=$cumbr0+$error0; // cumulating signed error of actual (integer) prediction
$cumbr=$cumbr+$error; // cumulating signed error of exact (non-integer) prediction
// Standardize Projection Precision For Display:
$proj=cleanup($proj);
$build=$build.substr(' '.$proj0,-10).substr(' '.$error0,-5).' '.$proj;
// Extrapolated Runs:
// Calculate XR:
$proj=($xK1*$sgl)
+ ($xK2*$dbl)
+ ($xK3*$tpl)
+ ($xK4*$hr)
+ ($xKsb*$sb)
+ ($xKsh*$sh)
+ ($xKsf*$sf)
+ ($xKibb*$ibb)
- ($xKcs*$cs)
- ($xKso*$so)
- ($xKdp*$gdp)
+ ($xKhb*($hb+$bb-$ibb))
- ($xKq*($ab-$hits-$so));
// Figure Errors:
// figure:
$error=$proj-$runs; // error of exact prediction (non-integer values allowed)
$proj0=round($proj); // results rounded to nearest whole integer
$error0=$proj0-$runs; // error of actual (integer) prediction
// record:
$sizexr0=$sizexr0+abs($error0); // size of actual (integer) prediction error
$sizexr=$sizexr+abs($error); // size of exact (non-integer) error
$cumxr0=$cumxr0+$error0; // cumulating signed error of actual (integer) prediction
$cumxr=$cumxr+$error; // cumulating signed error of exact (non-integer) prediction
// Standardize Projection Precision For Display:
$proj=cleanup($proj);
$build=$build.substr(' '.$proj0,-10).substr(' '.$error0,-5).' '.$proj;
// Write This File Line:
$build=$build.$lf;
fwrite($whandle,$build);
$counter=$counter+1;
if (($counter%10000)==0) echo substr(' '.$counter,-6).$lf; // visual progress display
$pa=0; // pre-set, to detect if file finished
// if ($counter>9) break; // for testing
}
// Close Open Files:
fclose($rhandle);
// Calculate Per-Inning Averages:
$t0avg=$sizetop0/$counter;
$b0avg=$sizebr0/$counter;
$x0avg=$sizexr0/$counter;
$tavg=$sizetop/$counter;
$bavg=$sizebr/$counter;
$xavg=$sizexr/$counter;
// Create Summary:
$summary=NULL;
$summary[]='==================================='.$lf;
$summary[]=' Innings: '.$counter.$lf;
$summary[]=$lf;
$summary[]='TOP:'.$lf;
$summary[]='----'.$lf;
$summary[]=' Average Error: '.$t0avg.$lf;
$summary[]='Cumulative Error: '.$cumtop0.$lf;
$summary[]='Exact Cum. Error: '.$cumtop.$lf;
$summary[]=$lf;
$summary[]='BR:'.$lf;
$summary[]='---'.$lf;
$summary[]=' Average Error: '.$b0avg.$lf;
$summary[]='Cumulative Error: '.$cumbr0.$lf;
$summary[]='Exact Cum. Error: '.$cumbr.$lf;
$summary[]=$lf;
$summary[]='XR'.$lf;
$summary[]='----'.$lf;
$summary[]=' Average Error: '.$x0avg.$lf;
$summary[]='Cumulative Error: '.$cumxr0.$lf;
$summary[]='Exact Cum. Error: '.$cumxr.$lf;
$summary[]=$lf;
// Save & Display Results:
foreach ($summary as $line)
{
fwrite($whandle,$line);
echo $line;
}
fclose($whandle);
echo 'Done.'.$p;
?>
Fatal error: Function name must be a string in /usr/www/users/owlcroft/highboskage/formula-notes.php on line 1285 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||