Monday, March 03, 2008

Out Of The Friaring Pan

Year
Name
PA
AB
R
H
2B
3B
HR
RBI
BB
K
SB
AVG
OBP
SLG
EqA
????
Player A
675
600
101
168
52
3
26
81
58
104
3
.280
.356
.507
.285
????
Player B
675
613
116
180
42
15
20
78
47
83
35
.293
.352
.507
.288
????
Player C
675
615
103
175
55
1
28
99
43
102
5
.285
.336
.514
.282
????
Player D
675
619
111
171
41
13
18
73
46
85
34
.276
.333
.475
.275
????
Player E
675
617
98
173
55
2
28
95
41
113
6
.280
.334
.511
.280
????
Player F
675
620
106
172
38
10
15
64
46
80
35
.277
.330
.447
.268

Alright, of those players which would you take? They all play the same position and defensively let's assume that they are all similar for the sake of simplicity. One of those players actually managed to win an MVP award. Go figure. The PA's obviously are not identical, but the stats were based off a 675 PA season - assuming that each player would be given the same amount of opportunities if they were on the same team. Alright, pencils down. It was a trick question. It shouldn't have been too hard to figure out that there are only two different players listed, ranging from various levels of success on their own age curve. Players A, C and E are none other than Khalil Greene. Players B, D and F are 2007 MVP Jimmy Rollins.

Player A's stats are the 2006 road stats for Khalil Greene extrapolated out to 675 PAs. Player B's stats are the 2007 road stats scaled to 675 PA's for Jimmy Rollins. They're nearly identical, but of course Khalil Greene gets no love because of his admittedly dreadful home stats for his career. Player C represents the 2006 and 2007 road stats for Khalil Greene and Player D represents the same for Jimmy Rollins. All I can say is it looks like Khalil Greene is the good version of Jhonny Peralta with better defense. Player E would be the last four years on the road for Greene and Player F for Jimmy Rollins, since Citizen's Bank Park and PETCO Park both opened in 2004. As one can easily tell Khalil Greene has been remarkebly consistent in his high level of production on the road, and remarkably consistent in his low level of production at home.

The last four years Greene's EqA at home has been .258, .248, .225, and .233 with the latter figures being the more recent totals. It's pretty obvious that Khalil Greene has been a replacement level hitter the last couple of years in PETCO. However, on the road those figures are .293, .263, .285 and .279. Those are solid numbers for a shortstop of his defensive caliber and would make him comparable to Miguel Tejada. Yet Greene's EqA overall sits around .260 thanks to him playing half of his games in PETCO.

The simple fact of the matter is that he's about as bad of a fit for PETCO as there is in baseball. In 2007 Khalil Greene hit a flyball 47.2% of the time, placing him 11th in baseball and in the 92nd percentile for the statistic. In 2006 his 46.2% flyball percentage was in the 90th percentile. He was in the 90th percentile in 2005 as well, and was in the 89th percentile in 2004. On the flip side each of those four years he's been in the bottom twelve persent for ground ball rate. Given PETCO's conditions that aren't suitable for home run hitters, it's a logical conclusion that a guy like Khalil Greene ought to struggle there. Large struggles seem to be expected given that he's not just a slight flyball hitter, he's an extreme flyball hitter.

When you run through the data you find that for Padres hitters the correlation between flyball rate and EqA reduction in PETCO is only around .3, but at the same time you also find a bad sample. Over the last four seasons most of the at bats have gone to guys that are above average flyball hitters, similar to Greene. About three-fourth's of them are, so the sample really doesn't do justice in this study. We really can't conclude eiher way, although it appears that there clearly is a link between the two.

Either way it appears that PETCO is costing Khalil Greene millions of dollars. If he hit the open market today and hit the numbers that are posted above, it's quite likely given his defensive reputation that he would garnish a contract near $100M. However, given the stuggles of him in PETCO, it's likely that he'll be lucky to get half that. It's unfortunate for Greene. The Padres would be smart to realize that he has more value to other teams and that they can proably trade him and come out better. The fact is they're the Padres so he has to play half of his games there. It's an unfortunate situation for both parties. This year we're going to track Greene's road EqA on the left, by his face. I hope you enjoy! We love Khalil around here, dammit.

Sunday, March 02, 2008

Roster Crunch

By my math there are spots locked up on the roster:

1. SP Carlos Zambrano
2. SP Rich Hill
3. SP Ted Lilly
4. RP Bob Howry
5. RP Scott Eyre
6. RP Carlos Marmol
7. RP Kerry Wood
8. RP Michael Wuertz
9. P Ryan Dempster
10. P Jason Marquis
11. P
12. UT
13. C Henry Blanco
14. C Geovany Soto
15. 1B Derrek Lee
16. 2B Mark DeRosa
17. SS Ryan Theriot
18. 3B Aramis Ramirez
19. PH Daryle Ward
20. LF Alfonso Soriano
21. CF
22. RF Kosuke Fukudome
23. OF Matt Murton
24. UT
26. UT

So there are basically two bench spots open, the starting center field position, one pitchers slot (could be SP or RP) and one spot that could be any position. The candidates are:

Felix Pie - CF

He's off to a great start this spring and had a very good season last year. The general consensus is that I strongly dislike Pie. That's not true, I just think his perceived trade value is greater than his actual value. He's clearly the best decision for the Cubs if they ignore Fukudome in center as a possibility. He'll win the job outright, I think.

Sam Fuld - CF

Sam Fuld is not very good option in my opinion. I'd like to believe that his minor league walk rates would translate to the majors, but it's not a very likely scenario. Sam Fuld compares favorably to Joey Gathright minus a lot of speed. Fuld's a tremendous defensive player and can be a useful spot starter at any spot in the outfield in a pinch because he won't kill you with his OBP. He's the centerfield version of Ryan Theriot. Joey Gathright has a career .400 OBP in the minors, but one of a paltry .333 in the Majors. He's walked a decent amount of time, but ML pitchers learned that he has two professional home runs and aren't afraid of him. Sam Fuld's very similar in that respect. His walk rate could and will probably evaporate. I think he's a great option for the fifth outfield spot. He'll break camp with the team.

Ronny Cedeno - SS

Ronny Cedeno will appear to be battling Alex Cintron and Mike Fontenot for what appears to be two open slots on the bench. He's got a leg up on Fontenot in that he can play shortstop and he's got a leg up on Cintron because he's on the 40 man roster. Ronny's the only pure shortstop even remotely close to making this team. It's unfortunate he's not going to get a chance to reclaim the shortstop position.

Alex Cintron - IF

Alex Cintron has bench experience, which is something managers value for some reason. He's a brick defensively though, so he's not all that great for a utility spot. He's a career .277/.315/.401 hitter which is serviceable for a back up middle infielder (it's not all that worse than Theriot). I think he'll make this team. I really do.

Mike Fontenot - IF

I think Fontenot is going to be the odd man out again. It's a shame he's never gotten a chance with this organization. He's been a .280/.370/.450 2B for a couple years now toiling around in the minor leagues. It's too bad the front office hasn't opened their eyes until recently. He's getting old already.

Pitchers later!

A Sign of Things to Come

Following today's 8-6 loss the Cubs moved to 1-3 in the Cactus League. Oh well, it's only March second.

Today the Cubs faced Matt Cain. The Cubs lineup included Ryan Theriot, Kosuke Fukudome, Matt Murton, Ryan Theriot, Derrek Lee and Geovany Soto. They combined to go 4 for 15 with 3 walks. Felix Pie hit his second home run of the spring - more importantly Pie took his second walk. Neal Cotts got thrashed but he's Neal Cotts. Lilly allowed a run in a couple innings of work. Marmol closed out the game. I've got to think he's got the leg up on the closer role since he's coming in the game in the 9th in Spring Training. I don't really understand why considering by then the only players in the game will be in AA during the season.

Saturday, February 16, 2008

Equivalent Average Unmasked

Runs Batted In was created in the late 1800s. A few teams created the statistic to show how good they were. In fact, some sportswriters of the day realized it's inherent bias towards hitters in the middle of the order and disregarded it. The little guys with pointy hats and horse-drawn carriages knew what they were talking about. RBI would not surface as widely accepted statistic until after the dead ball era was over. Eventually it became THE way to grade an offensive players "production." We all know why it's a bad statistic.

Batting average has its flaws as well. If you go out on the street and ask someone what batting average is, they will respond with something sounding like this: How often a player gets a hit. Wrong. Batting average does not tell us how often a player gets a hit. It tells how often a player gets hit when while deciding to throw out some times he goes up to the plate for no reason other than we feel like it. It also fails to tell us to what type of a hit the player got. A single is not worth the same as a double. This is why we use on base average and slugging average. Then again is slugging average really any better? Well yes and no. It tells you the type of hit, but it still has the first problem of batting average. We're partitioning the times the player comes up to bat and excluding one for inherently biased reasons. Is on base average any better? It fixes the first problem, but fails to solve the second problem of batting average. It acknowledges all plate appearances, but it makes a walk and a home run equal.

We can sum on base average and slugging average for OPS, but then again who says that the relationship for that is better. Instead we can try to develop a system that solves both problems. Enter equivalent average. This post is going to describe anything and everything about EqA so you can come up with the exact EqA's BaseballProspectus comes up with. One of the criticisms for EqA is that BP develops it in a black box. No one knows how they arrive at it. They do spell out the method here. You can do all the things they do. You'll find out that the league leaders in EqA are generally around .300. BP's EqA leaders are generally around .350 or so. You can play around with the stuff in that article for days and never come up with anything remotely close to their EqA. Sorry. As TangoTiger put it: Opening up the black box will not cause a single dent on [BaseballProspect's] bottom line.

What I am going to tell you is everything and why Baseball Prospectus is doing what they do. It's rather simple. In fact it's essentially what people say mathematicians criticize sabermetricians for: Units. People who dislike sabermetrics generally say real mathematicians would hate their "work" because they shed units completely. This really isn't true. Everything in EqA is measured in relatively precise units that in the end cancel out leaving an answer in runs.

Now let's go on and attack the two major problems with oba, slg, and avg. We need to create some sort of rate statistic that includes getting on base and hitting for extra bases as well as stealing a base efficiently. The first thing that is calculated answers all of these problems in what they feel is the best way. We'll call this Raw:

Raw = (SF + SH + 1.5*BB + 1.5*HBP + 1.5*SB + 2*1B + 3*2B + 4*3B + 5*HR)/(SF+SH+BB+HBP+SB+CS+AB)

What is Raw measuring? It's essentially scaled bases per opportunities of moving up a base. Intuitively the idea that walks are worth more than sacs, but not quite as much as singles is good. Raw EqA addresses our two problems effectively, only adding in SB and CS, which can be described as a third problem with each oba, slg and avg. So in the end what does raw measure? Scaled Bases per PA+CS. It gives a numeric value of production. Now we can use Raw and convert it to runs. For a team we do this with this equation:

EqR = (Raw/LgRaw )^2* PA * LgR/LgPA

So what is EqR doing? It's measuring the relative production of the team divided by what an average team does squaring and multiplying it by PA and the runs per PA an average team scores. The squared term is based on the idea that the relationship between Raw/LgRaw and runs is not linear. This makes sense because when you add good hitters your other good hitters get more guys on base and each of their hits cause more runs. Now since we're looking at EqR on a team level and we want it on the player level let's look at that.

First, an assumption: The player in question is being analyzed by an average team in his home park. This assumption is needed to derive the equation most people see for EqR. Now, to look at the change in EqR for some change in Raw, take the derivative of EqR with respect to Raw. We get this equation:

dEqR = 2*Raw/LgRaw*PA*LgR/LgPA

Now we're adding some guy to this team, but a team only has nine slots it can play. So what are we doing? We're replacing an average player on this team and adding this players production. So basically we have our runs minus an average player's runs in the same PA. We're NOT measuring runs over an average player. We're measuring all of the runs created by a player. So our equation becomes:

dEqR = 2*Raw/LgRaw*PA*LgR/LgPA - PA*LgR/LgPA

Now we can factor out PA*LgR/LgPA resulting in the equation for EqR for a player you'll see at BP, only they drop the dEqR and call it EqR.

EqR = (2*Raw/LgRaw - 1) * PA* LgR/LgPA

Generally people look at that and say what the heck are they doing? Now you know why you're subtracting 1 and multiplying the ratio by two. Here is where we can multiply this by our park factor to normalize for parks, if desired. Now we want to scale EqR and to some rate statistic. What should we use? Outs of course. Why? Outs are the stopclock in baseball. We have 9 sets of 3 outs. We can bat as long as we want as long as we don't make those outs. So we decide to make our rate be something close to runs per out used. So then we get this equation, that you can find at BP, albeit not in the article I linked to regarding how to compute EqA (lol).

EqA = (EqR/Out/5)^.4

First let's analyze the "units". We have runs divided by outs, which is want we wanted. Pay no attention to the .4 right now. The thing that should cross your mind is what crosses everyone's mind: Why the hell do they divide by five? WHY? This is where everyone gets lost. In fact if you follow the calculations done in this thread and divide by five you will won't get the EqA BP computes. This is the black box, so to speak. Remember, average EqA is supposed to be .260. If you plug all this in you'll get the league average to be about .266 or so, depending on the season. IT DOESN'T WORK. 5 is more or less a constant that forces the average to be equal to .260. How do we do that?

Well League average is going to be (LgR/LgOut/C)^.4. Since we want to "force" EqA to be equal to .260 for an average player, simply set that equation equal to .260 and solve for C. So C =(LgR/LgOut)/.260^2.5. This number tends to be around 5, ranging anywhere from 4.6 (Japan Central League) to about 5.6 (2007 AL). The 2007 National League was about 5.2.

And there, with the above information you can get the exact answers that BP gets for EqA and puts on their player cards. In fact, If you want to you can find out the park factors to extra digits. I've gotten to the point where the average "error" on the EqA I come up with is .000226 compared to their's. Remember that their EqA is the ring of integers divided by 1000. In other words: It's rounded after three digits. Theoretically, the average error in rounding then will be .00025, which is actually greater than the error I come up with.

So there you have it. EqA perfectly. Now go look up EqR on BP and you'll see this:

EqR = 5*Out*EqA^2.5

Oh and 1/2.5=.4, so solving that equation for EqA gives us the EqA=(EqR/Out/5)^.4. Look familiar? Oh, but now we're all smart enough to realize that the five isn't five.

And yes, in case you noticed LgRuns gets canceled out. If you plug in everything you get:

EqA = ((2*Raw/LgRaw - 1) * PA* LgR/LgPA) * Out * LgOut/LgR*.26^2.5)^.4
EqA = ((2*Raw/LgRaw - 1) * PA * Out * LgOut/LgPA * .26^2.5)^.4

When you ever want to scale EqA to some league average production based on runs, it's going to cancel out....which of course makes sense.

Sunday, February 10, 2008

Building A Projection System

Rk
Name
Pos
Act
Pec
My
E P
E My
1.
Rodriguez
3b
.340
.319
.310
.021
.030
2.
Ramirez
ss
.315
.277
.302
.038
.013
3.
Renteria
ss
.297
.262
.259
.035
.038
4.
Rollins
ss
.290
.274
.272
.016
.018
5.
Jeter
ss
.285
.305
.277
.020
.008
6.
Guillen
ss
.283
.306
.291
.023
.008
7.
Reyes
ss
.278
.276
.266
.002
.012
8.
Tejada
ss
.271
.296
.289
.025
.018
9.
Young
ss
.270
.286
.275
.016
.005
10.
Wilson
ss
.269
.247
.246
.022
.023
11.
Eckstein
ss
.266
.247
.257
.019
.009
12.
Greene
ss
.263
.272
.266
.009
.003
13.
Hardy
ss
.261
.254
.241
.007
.020
14.
Sea Bass
ss
.260
.247
.233
.013
.027
15.
Cabrera
ss
.260
.260
.253
.000
.007
16.
Peralta
ss
.259
.281
.265
.022
.006
17.
Loretta
ss
.254
.252
.250
.002
.004
18.
Bartlett
ss
.253
.269
.252
.016
.001
19.
Betancourt
ss
.248
.251
.244
.003
.004
20.
Scutaro
ss
.246
.253
.246
.007
.000
21.
Furcal
ss
.244
.268
.278
.024
.034
22.
Lopez
ss
.239
.272
.268
.033
.029
23.
Drew
ss
.236
.276
.289
.040
.053
24.
Durham
2b
.227
.295
.269
.068
.042
25.
Lugo
ss
.225
.269
.261
.044
.036
26.
Uribe
ss
.222
.263
.228
.041
.006
27.
Vizquel
ss
.221
.264
.242
.043
.021
28.
Crosby
ss
.219
.265
.247
.046
.028
29.
McDonald
ss
.211
.215
.215
.004
.004
30.
Izturis
ss
.210
.234
.221
.024
.011
Average
ss
.257
.269
.260
.023
.017
As I sit here working on a simple projection system to evaluate translations from Japan to the United States, I beta ran one of the simple methods I came up with. The method is based on Marcel and I was only trying to project Equivalent Average. I looked at most middle infielders from the 1990s and developed a simplistic general age curve for all of them. Fitted that using a similar weighted season process that Marcel uses. I then looked at the set of 2007 SSs with a large amount of PAs and compared the projections versus the actual results for PECOTA and the simplistic method I came up with. Surprisingly the method I devised was more accurate. Weird. In case you're interested, the results are to the right.

Saturday, February 09, 2008

Shortstop Rankings

Rk.
Name
Pos
R
HR
RBI
SB
AVG
1.
Hanley Ramirez
ss
112
22
78
41
.309
2.
Jose Reyes
ss
111
14
66
63
.288
3.
Jimmy Rollins
ss
110
21
76
31
.286
4.
Troy Tulowitzki
ss
99
22
92
9
.286
5.
Derek Jeter
ss
102
12
75
15
.306
6.
Carlos Guillen
ss
86
17
83
12
.295
7.
Rafael Furcal
ss
99
10
56
28
.281
8.
Miguel Tejada
ss
77
19
88
4
.298
9.
Michael Young
ss
81
12
80
9
.299
10.
Jhonny Peralta
ss
92
21
84
4
.272
11.
Yunel Escobar
ss
85
8
70
12
.297
12.
JJ Hardy
ss
85
23
84
3
.270
13.
Orlando Cabrera
ss
88
9
70
18
.273
14.
Stephen Drew
ss
77
18
78
8
.264
15.
Edgar Renteria
ss
82
10
61
10
.287
16.
Khalil Greene
ss
74
22
82
5
.252
17.
Julio Lugo
ss
75
7
56
26
.269
18.
Brendan Harris
ss
77
14
74
5
.270
19.
Asdrubal Cabrera
ss
78
9
57
18
.266
20.
David Eckstein
ss
82
4
49
10
.280
Average
ss
93
16
77
18
.287
Replacement Level
ss
75
9
58
12
.272
Who does not love fantasy baseball? This is the first entry in a series of posts that will rank fantasy players based on their projections for the 2008 season. The projection systems that are used to come up with a players projections are PECOTA, Bill James' and ZiPS. An estimate was made based on depth charts to see how many plate appearances can be expected by each player, injury likelihood included. This cuts out projections where the PT is low because of flukish injuries, like Derrek Lee's. Shortstop happens to be the position that has the least depth this year, but it's also quite top heavy. Three shortstops are going in the first round. They're elite status, and are top 12 players.

I have several strategies I like to employ at short. The two guys I like to target are Derek Jeter and Stephen Drew. When I draft Jeter I usually do it for his batting average. Drafting his average allows you to invest in guys who are good power hitters but do not hit for a great average. The four guys that immediately come to mind are Ryan Howard, Adam Dunn, Josh Fields and Chris Young. Jeter also adds some runs and some steals on the side. Drew's a bit of the opposite. He has nice power upside, although the projections are not the greatest in the world. He probably won't hit higher than .290, but he's a good gamble late.

I don't advise investing a top five pick on Jose Reyes. I have him rated #10 overall and I just don't see the point of investing in a two category player in the first round. You're limiting yourself way too much.

Reboot #2

Let's see, last time I decided to promise my zero readers that I was going to reboot this blog and make more posts, it last all of one post after that one. Maybe this time it will last longer. I would not bet on it, but it is worth a try. We'll see...

Sunday, March 25, 2007

One Week

That's all it is until the beginning of the season. For all of my zero readers, I promise to update this with more regularity than I did in the offseason. You have the word of a liar. Enjoy. But for now, I've got the iPod going, so I can rant a little bit. Today, let's look at the pleasure of our pitching riches and how that shakes down at the five spot.

As of this writing there are still technically three pitchers who have yet to be eliminated for the five slot. That does not mean that each actually has a realistic chance, but oh well. With Angel Guzman, Wade Miller, and Mark Prior all fighting for the spot, I feel so honored. Not really, but whatever.

Wade Miller is the odds on favorite for the spot. He's no longer the gunner he was with Houston, and now gets by on more finesse and "pitchability." Ironically enough he's displayed a surprinsingly high strikeout rate without breaking ninety on the gun. Wade's overall line this spring has been quite impressive, a 3.63 ERA in the Cactus League and a 16:4 K:BB. He has been knocked around for three home runs already, and that could be an issue if he tries to catch a hitter off guard with a supersonic 87 MPH fastball as he adjusts to not hitting 97 with it.

Angel Guzman had the luxury of coming into camp in shape after winter leagues, but had a pretty disappointing spring, in my opinion. He had the five spot by the balls once Prior started sucking. His stuff has looked dominant, but his inability to finish hitters may have cost him a spot in the rotation. His pure stuff has Uncle Lou raving, and rightfully so. I'd also venture to say his inability to finish hitters has also made the Cubs worse this season, by him not starting in the rotation. The biggest thing to jump out of his statline is of course the lack of strikeouts. It's a double-edged sword of course. Hes walked just one on the spring, but is it a byproduct of him not getting swings and misses? Still at this point, his stuff should have gotten him a bigger chance.

Mark Prior says he is ready to go. The good news is that he's improved his velocity (and performance) which each successive outing. The bad news is that he's only cracked 90 a handful of times this spring. In my opinion, there are two courses of action. They both involve him not opening the season with the Cubs. It may make him mad, but he needs to understand why. While he's made progress, he's at least two or three starts away. He needs to be hitting 90+ consistently with command before he's back. The first scenario is opening with him on the fifteen day designated list, which means that he has all of April to "rehab" if need be. He has options left so we could go that route if it looks like he needs more than just April.

Saturday, March 03, 2007

Spring Fever

I know, I really suck at this blogging thing. However now there is at least going to be a constant flow of things to keep my mind on. With the flurry of moves the Cubs still continue to suck. Jason Marquis took the hill the first game of the spring and promptly gave up a run. Awesome. I am not expecting much if anything from the Marquis. God, he sucks. The second game was highlighted by a couple pretty good offensive performances. A two hit game by Mr Izturis - who sucks and a four RBI game by Matt Murton.

Todays game was a lot better. Instead of being handed their third loss of the young spring they tied the Athletics. Ronny Cedeno had a nice game with a home run off of Dan Haren and a bunt single later on. Wade Miller labored through his couple innings of work. Ryan Dempster had a nice begin to his spring. Jeff Samardzija also had a perfect inning touching 98. The wheels fell off with Carlos Marmol. Who, in a word, sucked.

Oh well tomorrow looks to be a dandy. We've got Rich Hill versus Jon Garland on WGN. Scott Eyre and a few others are supposed to pitch as well. Former Cubs Andrew Sisco and David Aardsma are slated to go for the White Sox.