Baseball players talk in cliches. But so do the rest of us.

Publish date: 2024-08-11

About this story

Our methodology was computational at the beginning and subjective at the end.

We started with about 7,000 Major League Baseball interview transcripts that were compiled by ASAP Sports, mostly from press conferences at playoffs and All-Star Games. We transformed the text into a database containing questions, answers and metadata about the answers, then extracted four- and five-word phrases and calculated a PMI (pointwise mutual information) score for each. (The higher the PMI score, the more probable that the phrase is a cliche.) We eliminated phrases that showed up fewer than seven times and had PMI scores of less than 25. The Python library NLTK was used for the text analysis.

We grouped phrases that were variations of each other together (within a one- or two-word difference) into a list of roughly 20,000 possible cliches. Then came the subjective part. From that list, we chose the ones that were the most interesting, then grouped those with similar meanings. And voila — the phrases we considered to be the cream of the cliche crop.

Photos by Brian Bahr/Getty Images, Eliot J. Schechter/Getty Images, Heather Hall/AFP/Getty Images, Dave Sandford/Getty Images, Lisa Blumenfeld/Getty Images, Kevin C. Cox/Getty Images, Maddie Meyer/Getty Images,Nick Wass/AP Photo, Gary A. Vasquez/USA Today Sports, Ezra Shaw/Allsport/Getty Images, AP Photo/Stacy Bengs, Christian Petersen/Getty Images, Rob Carr/Getty Images

ncG1vNJzZmivp6x7uK3SoaCnn6Sku7G70q1lnKedZLSzrc%2BhoJyrX2d9coSOrKeoqqSofKOt0p6ZmqScYrCttcKhnKxn