Baseball: Season Runs Simulator
About a week ago, White Sox twitter erupted in debate regarding the optimal lineup for the Sox to use in the upcoming 2020 season. The White Sox Talk podcast guys at NBC Sports Chicago seemingly kick started the debate and it got me thinking: is there a way I can create a program that simulates a season and spits out the number of runs a lineup can score? The answer is yes, but with some caveats that I’ll get into later on. Once I created the program (in python), I posed the same question to White Sox twitter and was overwhelmed by the amount of responses! In total, I had 37 unique lineups proposed and ran through the program. I also ran the White Sox Talk Podcast lineups to see how they stacked up to the competition. Here is how everyone fared, so if you participated, check out where you ended up ranking (shout out to Beef Loaf)! If you’re interested in how I went about setting the logic of the program, make sure you check out my explanation towards the bottom of the page.
Top 5 Lineups:
1. Madrigal, Grandal, Moncada, Encarnacion, Abreu, Jimenez, Robert, Anderson, Mazara
2. Grandal, Moncada, Abreu, Jimenez, Encarnacion, Robert, Anderson, Mazara, Madrigal
3. Grandal, Moncada, Abreu, Jimenez, Encarnacion, Anderson, Mazara, Robert, Madrigal
4. Grandal, Moncada, Jimenez, Encarnacion, Abreu, Anderson, Lobert, Mazara, Madrigal
5. Grandal, Encarnacion, Moncada, Madrigal, Abreu, Jimenez, Mazara, Robert, Anderson
What do you notice about these lineups? What jumps out to me is that the highest performing lineups have high On Base Percentage (OBP) performers at the beginning and end of the lineup. Grandal has the highest projected OBP for the 2020 Sox, followed by Encarnacion, Moncada, and Madrigal. Sandwiching the bashers (Abreu, Encarnacion, Jimenez, Mazara) with high OBP appears to be a recipe for scoring a lot of runs! Granted, this simulator is biased towards OBP (but not sure what projections wouldn’t be).
Forgive me if I’m wrong, but I believe it was @RegionRat14 that asked what a lineup made up on the best OBP (in descending order) would fair. This lineup produced the 5th most runs! So purely basing it on OBP doesn’t net you an “optimal” lineup. My original intention was to have this run through each possible lineup combination of 9 selected players, but this program would literally take 2+ years to run this once for each lineup…so that was not feasible. So, thanks for participating and comment with any questions you may have!
Fun Graphic Showing Distribution of Runs Across Lineups
Additional Lineups Ran
Each of these lineups was taken from RotoChamp.com
Minnesota Twins: 898 runs
Los Angeles Dodgers: 852 runs (w/ Pitcher)
Chicago Cubs: 788 runs (w/ Pitcher)
The White Sox have some ground to make up on the Twins!
Nuts and Bolts of the Simulator Program
Source of Statistics: Fangraphs
I used 2020 projection data for the simulator
There are several caveats to provide before advancing:
Base-running is simulated as “station to station”. By this, I mean that base-runners only advance as much as is dictated by the at-bat occurring. Example: a runner on first base will advance two bases if the current at-bat results in a double.
No stolen bases
I created a feature BOPO (ball in play out) to capture when a batter puts the ball in play, but that results in an out (ground ball, line drive, fly ball). Each of these has a percentage chance attached to it that allows any current base-runners to advance. Ground balls have the highest, followed by fly balls, and line drives.
Does not account for pinch hitting or any lineup swapping. It is purely static.
Okay, so how does it work?
I take the first batter in the lineup and evaluate their OBP, K%, and BOPO (1 – (OBP + K%)). I generate a random number between 0-1 and if it’s less than or equal to OBP, the result is the batter reaches base. If it’s between OBP and OBP + K%, then the result is a strikeout and no runners advance. If it’s greater than OBP + K%, then the ball is in play, but is an out and there is an opportunity for base-runners to advance.
If the batter reaches base: Take Single %, Double %, Triple %, Homerun %, and BB %. I generated another random number between 0-1 and determine what the outcome is. Base-runners advance appropriately.
If batter strikes out: Out += 1
If batter is out, but ball is in play: Take Fly Ball %, Ground Ball %, and Line Drive %. Generate another number between 0-1 and use the result to determine if the ball is a ground ball, fly ball, or line drive. There are % chances attached to each outcome to determine if base-runners can advance. A random number is generated and if it is less than or equal to the predetermined odds, then they advance one base. If not, they stay put.
This is repeated for 9 innings and the runs are recorded. After, everything is reset to 0 and it is simulated 162 times.
To gain a better understanding of performance, each lineup was ran 10 times. I would have preferred to run each 1,000 times, but it takes about 7 minutes to run 10 seasons for one lineup. This wouldn’t be feasible with the code I’ve written. If you can write it more efficiently, please do! I’m new to programming and am still learning. Time constraint was something on my mind and I felt that I took some measures to preserve memory, but perhaps there are places where this can be improved!
Want to see the code? Here's a link to my Github page. It's also under the "Portfolio" tab.