Exit Velocity Over Expected (EVOE): My MS of Data Science Capstone Project
To complete the requirements of my graduate program at DePaul University, I needed to complete a machine learning/data science/analytics project and decided to continue with my theme of focusing on the use of data within the context of Major League Baseball. Without further ado, I present to you: Exit Velocity Over Expected!
Hitting, in baseball, is quite simple: see the ball, hit the ball. But, like most things in life, it is not as simple as it sounds. Imagine having the task of facing down a pitcher, as they unleash a fastball moving towards you at 97 miles per hour and spinning at a rate of 2,300 rotations per minute. See the ball, hit the ball, right?
As the game of baseball has advanced throughout the years, pitchers and hitters have been in a constant arms race to out compete the other. In the1990's and early 2000's, the home run was king thanks to players like Ken Griffey Jr., Mark McGwire, Sammy Sosa, and Barry Bonds. In the 2010’s, the game saw pitchers’ ability to throw the ball hard advance at a rate unprecedented throughout the Major League Baseball’s (MLB) history. From 2008 to 2018, the league average fastball velocity, in miles per hour (mph), rose from about 91.5 mph to 92.8 mph. Seemingly a small amount, but here’s another way to look at the increase in velocity league wide: in 2008 the league rate for throwing fastballs 95 mph or faster was just over 12%. By 2018, it had risen to about 22%. (1). In essence, pitchers are throwing harder than ever, so how can hitters continue to find success?
Just as pitchers have advanced in their development and understanding of their craft, hitters have made similar strides. In today’s MLB, focusing on launch angle, the angle of the ball leaving the bat after contact, and exit velocity, how fast the ball is leaving the bat in miles per hour (mph), have become the prime strategy of hitters. Taking it a step further, players are focusing on obtaining a “barrel” classification on a batted ball event. This is when a batted ball has an exit velocity higher than 98 mph and a launch angle between 26-30 degrees (2). The reasoning for hitters desiring to hit the ball hard is that, the harder the ball is hit, the more likely the outcome is going to be advantageous for the batter. In the table below, recorded Exit Velocity from batted ball events in the 2020 MLB season were put in tiers and the hit success (AVG) was aggregated. It is clear to see the relationship between a higher Exit Velocity and a successful batted ball outcome.
Figure 1: Batting Average and Exit Velocity Tiers
With this in hand, the focus of this project was to understand the mechanics of the power struggle between pitchers and hitters. Specifically, to model exit velocity and to create a metric that indicated how well, or poorly, a player hit the ball based on the attributes of a pitch. The value of such a metric, known as Exit Velocity Over Expected (EVOE), would serve two purposes: insights for in-game strategies and a tool to evaluate players for the construction of a roster. The beauty of this metric is that it would provide insights for not only batters, but also pitchers since they can be evaluated on the Exit Velocity generated off their thrown pitches.
So, what the heck is EVOE? How does it work?
Before jumping into a brief overview of the model and how it works, let's first talk about the data. Everything used for this project was made possible by James LeDoux - who created a python package, called pybaseball, that allows for a user to query baseball data. I specifically used it to query Statcast data from Baseball Savant's website. If you're reading this and unfamiliar, go check out Baseball Savant - it is an incredibly cool website. For context, Statcast was introduced to MLB parks in 2015 and is a combination of radar and camera technology that takes measurements of ball and player movement within the field of play.
As mentioned, I used the pybaseball package to query data from the 2017 - 2020 regular and post seasons. Since this was a project only focused on exit velocity, I only included batted ball events with an associated exit velocity reading. I also did some QA checks to make sure there weren't any crazy outliers that would mess up the model. With the data nice and clean, I also had to make sure that I was including relevant and appropriate variables. I only wanted to focus on variables that were associated with describing a pitch, so only included things like: velocity of the pitch, spin rate, horizontal and vertical movement, location of where the ball crossed the plate, etc. Now that I had the data ready to roll, I was ready to model!
Without getting into the weeds about modeling, which I will happily discuss with anyone who is interested, I ultimately settled on a tuned XGB Regressor model. The way the model works is it trains on a set of data so that it can learn the patterns present within the dataset and then apply what it learned to new, unseen data. I trained the model on the 2017 - 2019 data and then ran the 2020 data to generate the EVOE metric.
Okay, but what the heckin heck is Exit Velocity Over Expected? Simply stated, given the attributes of a pitch (speed, spin rate, movement, location, etc) by how many miles per hour (mph) does a player over or under hit the ball? Well, how do you determine if a player over or under hits a ball? Glad you asked! That's where the modeling comes in. To make this determination, I took the exit velocity that the model predicted and subtracted it from the actual exit velocity of that batted ball event (yes, it's simply the residual...for my fellow data nerds). If this number was positive, it means the player hit the ball harder than expected. If it was negative, it means the player hit the ball softer than expected. Here's an example:
On August 25th, the Chicago Cubs played against the Detroit Tigers and Spencer Turnbull threw a sinker to Javier Baez that resulted in an EVOE of 15 mph. This is because the model predicted that a sinker, thrown outside the strike zone low and inside, with a spin rate of 2,182 rpm, and thrown at 97 mph would result in a batted exit velocity of 79 mph. But, Javy actually hit it 94 mph. Subtracting the predicted exit velocity (79 mph) from the actual exit velocity (94 mph) resulted in an EVOE of +15 mph.
My guy Beef Loaf (@MrDeclicious13) asked, on a teaser twitter thread about this project, what it meant for a player to have an exit velocity over or under expected. It was still in the early stages of the project, but I suspected that it had to do with a player's ability to hit elite, or difficult to hit pitches, better than a typical MLB player. Using the Javy Baez example, for all 2020 batted ball events whose pitches were sinkers, thrown outside of the strike zone low and inside (to righthanders), thrown at a velocity greater than 96 mph, and at a spin rate greater than 2,180 rpm had an average EVOE of 1.25 mph. Javy's was 15 mph, so whether it was pure luck, or his ability to read the pitch well enough that resulted in a favorable exit velocity, an analysis on a player's trends would better speak to whether the event was luck or skill.
Use Cases and the Value of the Metric
Now that you understand the mechanics of the metric and how it works, let's talk about how this can be used and where value can be found. In my opinion, the true value of the metric comes from drilling down to a specific player's EVOE profile. To provide an example, Rafael Devers, of the Boston Red Sox, had an average EVOE of 7.28 mph for the 2020 MLB season, which means that on average, he hit the ball about 7 mph harder than expected. A helpful way of breaking a player down is to evaluate their EVOE performance to specific pitch types, zone locations, and a combination of pitch type and zone location. Additionally, evaluating the average pitch speed and spin rate of those pitches and the resulting launch angle, would be helpful to understanding the trends present in the data.
Evaluating Dever’s pitch specific success, he saw the highest average EVOE for curveballs (10.70 mph), changeups (9.59 mph), and sliders (8.88 mph). This is particularly interesting because these pitches tend to be thrown at lower speeds and with higher spin rates than other pitches. While the lower speed may be seen as an advantage for a batter, the spin rate and general combination of horizontal and vertical movement of the ball, as it approaches the plate, make them difficult to track for the batter and thus difficult to hit. To support this assessment, evaluating the average exit velocity across the league, by pitch type, from the 2020 season, shows that the four-seam fastball has the highest average exit velocity at 91 mph, whereas curveballs’, changeups’, and sliders’ average exit velocities are in the 85-86 mph range. Comparing Devers’ average exit velocities for curveballs, changeups, and sliders to the 2020 MLB average, he hit curveballs about 4 mph harder, changeups about 9 mph harder, and sliders about 9 mph harder.
Taking a deeper look at his performance based on the zone in which the ball crossed the plate, it was interesting to find that 32% of Devers’ batted ball events in 2020 occurred outside of the strike zone. Not only did this account for a large portion of his batted balls, but the average EVOE for these batted balls was 7.62 mph, which was higher than his overall average. Additionally, the 7.62 mph was the highest average EVOE, by a player, for balls hit outside of the strike zone in MLB for 2020. His overall success against curveballs carried over to curveballs hit outside of the strike zone. Devers’ had 11 batted ball events for curveballs thrown outside of the strike zone and had an average EVOE of 15.18 mph. Of the 11 batted ball events, 6 of them were against curveballs in the 70th percentile, or better, for curveball spin rate and 5 of them were successful hits (.454 batting average). So, it cannot be said that Devers took advantage of subpar curveballs since he found great success at hitting higher quality curveballs thrown outside of the zone. A caveat to this assessment is that EVOE only considers batted ball events and not all pitches thrown. It is entirely possible that Devers rarely hits curveballs outside of the zone, but when he does, he hits them quite hard.
Another way to use the EVOE metric to evaluate a player is to instead of focusing on where they are successful, which is important, but to instead investigate where they struggle. Continuing to use Devers as an example, despite his success out of the strike zone, he struggled to generate strong contact when being thrown a sinker out of the zone. A sinker is a pitch that has a hard, downward movement and is known for inducing groundballs. Of the six batted ball events for sinkers outside the strike zone, he only had a positive EVOE on two of them and an overall average EVOE of -6.14 mph and a batting average of .167. For context, his average EVOE for sinkers hit within the strike zone was 6 mph. With this information, an opposing team would be wise to attack Devers with sinkers thrown out of the strike zone, specifically inside, closer to the batter since he is likely to have weaker contact than expected. An important caveat to this callout is that it is a small sample size and that it would be wise to keep this in mind when formulating an in-game strategy for how to pitch to Devers.
Figure 2: Rafael Devers Batted Ball EVOE Chart
Taking this approach to evaluating a player with EVOE will provide insights similar to what was discussed regarding Rafael Devers. A batter’s overall average EVOE is indicative of their general performance, but the real value of the metric is generated when taking a specific look at a player’s performance. The same analysis can be applied to pitchers as well, but in their case, negative, or lower, EVOEs are desired since that indicates inducing weaker batted ball contact. To allow for further investigation, Google Data Studio dashboards were created (one for batters and one for pitchers). It has the functionality that allows for a user to select a player and evaluate their EVOE success against specific pitches and in specific zones.
I, by no means believe this is a catch-all, perfect evaluation metric. It has it's flaws - primarily being based on batted ball events and not taking into account all pitches thrown. BUT, I do believe that this metric begins to dig into what makes a player better, or worse, at hitting a ball. Inversely, the same for pitchers but for inducing harder or softer contact. I also believe that much of what could take this analysis to the next step lies within the biomechanics baseball research world regarding batters' abilities to rapidly diagnose the pitch thrown and choose an appropriate action for success (swing or don't swing).
As always, let me know what you think! I greatly enjoy your thoughts and comments, so please do not hesitate to reach out. In the meantime, check out the dashboards for yourself and find your favorite player's strengths and weaknesses! Additionally, check out some sample code that I've put up on my GitHub. It's not everything I used, but enough to get you on your way.
Special thanks to my girlfriend, family, and friends throughout my time in the Data Science program at DePaul. Without their patience and support, I know this degree would have been much more of a struggle. Additionally, thanks to Ben Draus and Tej Seth for serving as my technical resources.
1. Sullivan J. The Velocity Surge Has Plateaued [Internet]. FanGraphs Baseball. 2019. Available from: https://blogs.fangraphs.com/the-velocity-surge-has-plateaued/
2. Glossary: Barrel [Internet]. MLB.com. [cited 2021 Mar 14]. Available from: https://www.mlb.com/glossary/statcast/barrel