Pitch Type Sequence Similarity Ratio: Understanding the Role Pitch Sequencing Plays in the MLB

Ajay Patel and Sean Sullivan
Jun 1, 2023
7 min read

Taking a deep dive into the pitch sequencing was not something that Ajay and I initially talked about when we began discussing various baseball research ideas that we could collaborate on. The idea came about after I had watched part of the San Diego Padres and Milwaukee Brewers game on April 15th, 2023. It was during the bottom of the 5th inning when Tom Verducci made a comment about how Freddy Peralta was able to get to the third time through the Padres’ order due to his ability to switch up his pitch sequencing. I kid you not, on the next pitch, Jake Cronenworth smacked a changeup for a home run. It would have been easy to write off what Verducci said given the ironic timing of his comment paired with the homerun, but it seemed worth exploring. We ultimately decided that we wanted to further understand if a pitcher’s pitch sequence similarity had any relationship to various pitching performance metrics and we wanted to create a framework that would provide utility from our research efforts.

Before going into more detail, let’s first align on what we mean when we talk about pitch sequencing. In our case, we are talking about the order in which pitch types are thrown during a plate appearance. This isn’t the only way to think about pitch sequencing. We would be remiss to not consider location sequencing (where the ball is being pitched) as well as a combination of pitch type and location sequencing. But for this initial exploration, we focused on pitch type sequencing.

Figure 1: Example of a Plate Appearance Pitch Type Sequencing

There has been some previous work in this space, but we wanted to pursue a different methodology. We explored a variety of approaches to attack this research topic and ultimately landed on a simple solution. At its core, understanding how similar a pitcher’s pitch type sequencing is, we need to understand how many matching elements, or subsequences, are present when comparing two, or more, plate appearance pitch type sequences to each other. Luckily for us, the class, SequenceMatcher, from the difflib package in Python provides the ability to do just that. From SequenceMatcher’s documentation “This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980’s by Ratcliff and Obershelp under the hyperbolic name “gestalt pattern matching.” The idea is to find the longest contiguous matching subsequence that contains no “junk” elements; these “junk” elements are ones that are uninteresting in some sense, such as blank lines or whitespace. (Handling junk is an extension to the Ratcliff and Obershelp algorithm.) The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people.” (Link to Documentation).

The class also allows for us to generate a Sequence Similarity Ratio (SSR) which expresses how similar two sequences are to each other. A SSR of 1 means that they are a perfect match. A SSR of 0 means there are no matching subsequences. Any SSR greater than 0.6 indicates a close match. The actual equation for the SSR is fairly simple and an example is provided below.

Figure 2: SSR Explained

With SequenceMatcher in hand, we queried pitch-by-pitch data from Statcast via the pybaseball python package for the 2021, 2022, and 2023 (through 5/25/23) seasons. We did some basic data manipulation to get the pitch type sequence for every pitcher that was present in our query and ran it through our program to generate full season results for 2021, 2022, and partial results for 2023. Before diving too much further into deep dives and generating SSRs for various scenarios, we first wanted to understand two things. First off, was a pitcher’s Overall SSR sticky? Meaning that, was there any correlation and/or predictiveness between Overall SSR from Year X to Year X+1? To do this, we filtered all pitchers who threw at least 100 innings in 2022 (131 pitchers) and compared their 2021 Overall SSR to their 2022 Overall SSR. We found that the r-squared value was 0.500 and the Pearson correlation coefficient was 0.707 - which indicated that there was a strong relationship between the two values.

Figure 3: Correlation Graph

We also wanted to compare their 2022 Overall SSR to a set of pitching performance metrics to determine if there was any relationship between a pitcher’s sequencing and their performance. The results varied, but there was a weak relationship to a variety of metrics such as Swinging Strike %, SIERA, Batting Average Against, and FIP. Between seeing the year to year stickiness of a pitcher’s Overall SSR and some relationship to widely adopted performance metrics, we felt confident that there was some juice behind it and thought that it was worth moving forward with.

Figure 4: Comparing 2022 Overall SSR to Other Metrics

With the details out of the way, let’s move on to the fun stuff: the results. Below is a table of the “Leaders and Laggers” of the Overall SSR for the 2023 season. We ran this data through 5/25/23 and filtered results for only showing pitchers who have faced at least 150 batters. The filter is arbitrary, but we wanted to choose a threshold that allowed for a decent sample size given that we are only about a quarter of the way through the 2023 MLB regular season. In our opinion, it is advantageous for a pitcher to have an Overall SSR that is closer to 0 - which indicates that they are mixing up their pitch type sequencing. Values closer to 1 indicate that they have more matching subsequences from plate appearance to plate appearance.

Figure 5: 2023 Results

Seeing Yu Darvish as the leader of Overall SSR is no surprise. Statcast has logged Darvish throwing 8 distinct pitch types this year! He’s thrown a Sweeper (23%), a 4-Seam Fastball (19%), a Slider (18%), a Cutter (12%), a Sinker (11%), a Split Finger (10%), a Curveball (8%), and a Changeup (<1%) in 2023. Just looking at his pitch type distribution already hints at his sequencing being less similar given that he not only has a large arsenal to choose from, but he also throws 6 of them more than 10% of the time. At the bottom is Hunter Greene, who is almost a two-pitch pitcher. He’s thrown a 4-Seam Fastball (54%), a Slider (40%), and a Changeup (6%) in 2023. Given that he’s only working with 3 pitches and that he throws two of them combined for about 94% of his pitches, it is not a surprise that his pitch type sequencing is very similar.

Simply knowing whose pitch type sequencing is most dissimilar or most similar is not all that insightful. We wanted to take it a step further and provide some ideas on how something like this could be utilized in a practical way. Below, we have a scouting report for San Diego Padres pitcher, Blake Snell. Snell is nothing spectacular when it comes to SSR as he is in the 25th percentile for Overall SSR. We chose Snell because he is in the 99th percentile of pitchers who had the largest difference in SSR when comparing their Overall SSR to their SSR of plate appearances from the 1st and 3rd Time Through the Order (yes, Rays fans, we are bringing this up again). In Snell’s case, his sequencing from the 1st and 3rd Time Through the Order are more similar - meaning that he’s throwing similar subsequences and thus can be more predictable for teams and players that are paying attention.

Figure 6: Scouting Report

In the scouting reporting, you will see a breakdown of his pitch type usage, his location data, and pitch movement information. You’ll also see a “Sequencing Summary” where his Overall SSR, the Pitches Per Batter Faced, and Pitch Types Per Batter Faced are shown. We also highlight the most common two and three pitch type sequences. The Lineup Slot Sequence Similarity Ratio Matrix allows us to compare sequence similarities between different spots in the lineup. In Snell’s case, he attacks lead-off and two-hole batters quite differently than he does batters in the 3rd-5th slots. Lastly, the Sequence Similarity Deep Dive provides us with the SSR of the following cuts: 1st Time Through the Order (TTO) vs 2nd TTO, 1st TTO vs 3rd TTO, 2nd TTO vs 3rd TTO, the SSR of plate appearances ending in a strikeout, the SSR of plate appearances ending in a walk, the SSR of plate appearances ending in a hit (caveat that these are often shorter sequences given that once a ball is put in play, the appearance is over), and the SSR when their are men on base. The “vs Overall Ratio” column is the absolute value of the difference of the Overall SSR and the SSR category being compared. The accompanying “vs Overall Percentile” is meant to help contextualize the result. Otherwise, it would be difficult to understand why the difference between Snell’s Overall Ratio and his 1st TTO vs 3rd TTO is worth discussing.

If a deeper look is desired, then having a post game summary, like the one below, could be useful. Such a look would allow for a team to have an explicit understanding of how a pitcher attacked their batters in a given game. In our example, we continue our profile on Blake Snell and showcase how he pitched to the Los Angeles Dodgers on May 12, 2023. From this report, you can quickly see how he attacked different batters and that his sequencing was very similar when comparing batters’ second and third plate appearances (granted, many of the third time plate appearances resulted in batted ball events, but the subsequences were still similar, which may have aided in the balls being put in play).

Figure 7: Post-Game Report

Moving forward, we intend on updating the Pitch Type Overall SSR throughout the 2023 season and highlighting interesting trends and examples that we find while mining through the results. We also will be exploring Pitch Zone Sequence Similarity and a combined Pitch Type and Zone Sequence Similarity. Initial findings show that the SSR’s are much lower given that there are typically more zones and pitch type + zone combinations than there are distinct pitch types. We also intend on exploring how pitchers’ SSR differs when facing Right Handed Batters and Left Handed Batters as well as how their SSR differs in various leverage situations.

We hope that you’ve enjoyed this analysis! As always, please feel free to reach out with any comments, questions, or constructive criticism. Thanks for checking this out!

URAM ANALYTICS

Pitch Type Sequence Similarity Ratio: Understanding the Role Pitch Sequencing Plays in the MLB

Packages Used:

Example Notebook

Recent Posts

Comments

Subscribe Form