Developing duxMLB, an extension for MLB fans
Contents:
This is how I built duxMLB and what it really is. It was developed for a hackathon and though I eventually got bored I decided to finally finish it and write about it. The following is the elevator pitch that I gave for the official submission.
You can now watch the official YouTube video here as well!
# Inspiration
I found this hackathon through a friend and I figured something very interesting can be done with this problem statement. Around a week before the launch of this hackathon I was working on an AI based terminal, which I called TerminAI
(I came 4th in that hackathon! It was hosted by Berkely so I am extremely happy). It was a LAM type framework where the user could enter their commands in natural lanaguge (and not bash or MAC commands) and have it execute. So, things they didn’t know how to do using a normal terminal could be done there. So, I was pretty much into
the idea of creating LAM based applications because I found normal LLMs to be pretty boring and overdone.
When I heard of this, the first thing that came to my mind was a live statcast generator, that would record your screen realtime and generate data based on that. This was the time when Apple Intelligence was also gaining some hype and it did something similar so I figured I would do this. But soon I realised that, though it make for a good project, it would’ve been pretty useless cause nowadays statcast data is present during games!
To figure what to build, I watched some games and I had never watched baseball before so it took time to figure out the rules and man I got hooked! I won’t go into my watching spree cause that’s probably not needed but it was a fun time. However, I found myself constantly searching for players and teams so the best place to start with seemed like a simple LLM that I can type the player’s name and get information. And morever it had to be while
I was watching the games cause I didn’t want to open new tabs! So, therefore the idea of a chrome extension
came to me.
Since then, I have included a lot more functionality to my extension (catering to the given problem statements) and I decided to call it duxMLB
that is Captain of MLB
in Latin (I like to have fun with names). The next few sections will explain more details about this extension.
# What it does
The following are the features that duxMLB shows:
- A panel on the side for you are watching live matches. It is
Gemini
instructed to be agood baseball coach
and answer related questions.
- It has three API calls that it can make. Firstly to get
player information
, secondly forteam information
and lastly for getting theschedule
. - The user can normally interact with it, and when they ask for any of the three above, the LLM calls an API and gets the results.
- The results are pretty printed through another Gemini model (that I call the
pretty_printer
).
- The panel also allows users to get the last
5 seconds of video
they were watching and generate statcast data for that.
The timing is crucial for this and the implementation is also pretty slick. I will explain how this works in the next section.
The
statcast
data generated is basically just two things:- Pitch speed
- Bat swing speed
If there is no such pitches or swings in those 5 seconds then it will output nothing basically.
- Then there is the
options
page. Here there are 3 things:
The first is the MLB future predictor which takes in the
homerun
data for the user and based on similarities with the top players, it tells the user how good or bad they are.This comparision is done using a dataset that was provided with the problem statement itself and a
vectorDB
.There is a guide page that also goes into the implementation and functionality (I didn’t know I would have to write here as well, so I was planning to use that as a guide)
Finally there is a classics video/name uploader which users can use to upload their favourite games, and generate statcast data for that. Again statcast data means
- Pitch speed
- Bat swing speed
If the user doesn’t have the video with them locally (which is usually going to be the case), they can simply enter the name and it will be scraped.
So, I have tried to develop mulitple functionalities mentioned in the problem statement into one product. I had a lot more things in mind, but I will talk about them in another section.
# How I built it
# The overview
First I will talk about the panel. The GUI is pretty basic, and built using plain HTML
, CSS
and JS
. This is because I have little experience with web development, and the basic technologies are something I understand and can debug. There is a flask
backend that supports this, flask because its pretty easy to set up and I haven’t learnt django or other python based web frameworks.
There is one main
file which contains all the necessary endpoints. The background.js
and content.js
javascripts are responsible for figuring out when the extension is supposed to be clickable and used. This happens only for websites that have a youtube.com
in them (basically YouTube). We can extend this to other MLB streaming channels as well, but for a hackathon run YouTube works the best.
The options page
for the extension which contains the other features also has a simple UI. I have added three functionalities to it for which I used a host of technologies like pinecone
, a vector database for similarities and selenium
for web scraping. There is also a guide page which explain what this website is about and some tehcnical implementations.
In the coming subsections I will talk majorly about the backend and now the UI because that is pretty normal stuff and nothing interesting.
# Building the live statcast setup
This setup is the one where you can type “How fast was that swing?” in the panel and get accurate results. This is the code the helps me get the video feed and store it in a rolling buffer
. This is the content.js
script basically.
// This is a trigger for the play button. As soon as that is clicked, you can head over to inspect/console to view the frames in the logs.
"play",, true;
This buffer is then sent to the endpoint /process-video
to the flask backend, where I store this buffer locally, then run my statcast models on it and give an output. The running time for all of this is around 1sec/frame which translates to 150 seconds for a 5 second video (in a corei5 10th gen HP laptop, if you have a better core then you’re lucky).
# Integrating the APIs
The panel can query APIs based on the user’s request. The main code that does this resides in the API_querying
directory of the backend
in a file called query.py
. This is a multi-model structure
where the user first goes through a Gemini model used to detemine if API calls are needed. This is determined by figuring out if the user is asking for either
- Player’s data
- Team’s data
- MLB schedule
The thing with the API calls was that the URLs expected the codes
that is, a numerical value rather than a string for the names. Thus, I first scraped the data using requests
and beautiful soup
and created a mapping for the team’s and player’s name to their official codes as expected by the API URLs. Thus the model returns a tuple object that I called name_code_tuple
which contains information on whether this was a player call, team call or schedule call and the relevant code.
assert == ,
=
,
# That is both are schedule, we're gonna assume 2024 for now, later we can add the yaer as well
= f
= .
=
=
return
# No need to go lower and all here, I have myself defined this
# This is the relevant url for the player's data
= f
= .
# Now parse this
=
= # ensuring this is empty
+= f
return
= # we'll write everything we print here
= f
= .
=
# Get analysis
=
+= f
+= f
+= f
# print(f"{pos}: {count}")
+= f
+=
+= f
=
+=
+= f
return
=
return
Then we query the relevant URL based on this tuple and pass the acquired output through another Gemini
model for pretty printing. I have tried to make things easy for this model by defning my own dataclasses
and essentially parsing each acquired content through the APIs and storing them in a single output variable which is then pretty printed.
# Calculating bat speed
I first tried to train my own model which was a big problem (as I have described in the next section) because the datasets I found online for baseball tracking
were not exactly representative of a pitcher throwing a pitch and hence detection stats were very bad. Then I found this GitHub repo which I ended up using for the final run. It came with pretrained weights for a YOLO model, on baseballs specific to those thrown in a MLB setting by a pitcher.
The following underlines how the speed calculation works: (assuming that the input video is of a pitcher throwing the ball)
- The video feed is broken down into different frames based on the FPS for the video and the total duration.
- The model detects the baseball in each frame and stores its center’s coordinates in the pixel dimension to a buffer.
- After the detections are done for the baseball, we run a different model to detect the pitcher and catcher in each frame and calculate their average positions during the pitch.
- The real distance between the pitching mount and catcher is around 60 to 61 feets. The calculated average distance between them in pixels is correlated with the real distance to find the
pixel_to_feet_ratio
. - The positions of the ball are then made into valid sequences. These are sequences of consecutive baseball detections in between frames (or with at max 2 missing detections in between valid detections) and these are testament to the baseball actually being pitched.
- For each valid sequence, we find the total distance it travelled in pixel coordinates in the X and Y directions (X,Y being the normal coordinate system) and calculate the average velocities in X and Y direction.
- We then calculate the parabolic average velocity and use the pixel_to_feet_ratio to convert this in ft/s.
- Next we correct for the camera angle using the angle made by the
pitcher and catcher to the normal
(that is, the line that splits the screen vertically) when the catcher acts a pivot to the normal line (so, we first shift the entire pitcher and catcher line mathematically). The math suggests that we calculate this angle usingbeta = float(np.arctan(2*(x1-x2)/(y2-y1)))
where x1,y1,x2,y2 are average coordiantes of the pitcher and the catcher.
I have only used this angle if it is greater than 10 degrees, because I found that only then are the results accurate enough. The corrected velocities are then the predicted ones divided by the sine of this angle.
=
= # Returns coordinates as (x1, y1, x2, y2)
=
, , ,
= # scale factor is pixel distance between those guys divided by actual distance between the niggas
= # The math is explained in the readme docs (UPDATE IT)
=
=
=
=
=
=
+= f + f
# Run the beta calculations for each frame
+= f
=
=
# if the angle is less than 10 degree, then the calculations become absurd!
= * 1/ # This necessarily implies that v_real is more than v_app which is true.
= * 1/
+= f + f
+= f + f
+= f
return
# Bat swing speed calculation
These are the steps following to calulate the maximum swing speed:
- The video is broken into different frames based on FPS and total duration.
- The
bat detection model
runs in each frame and stores the coordinates of the bat in a list. - The
pixel_to_ft_ratio
is calculated for the video using the bat length in pixel coordinates divided by that in real life. - The detections for the bat are passed to a
univariate spline
to get a continous detection sequence. I then differentiate this spline using small time intervals and find the maximum speed.
# The load models button
This is simple, its a button that when pressed, installs the weights of the trained model on the user’s computer (If I could host the backend, this button woudn’t exist). This was required because otherwise the models load when they are called and that is infeasible and time taking.
=
=
=
=
=
=
=
return
# Challenges I ran into
There were multiple!
- Hosting it on render
I had initially hosted the backend on render, so all the user would have to do would be install and play. This also meant having to change all the absolute paths in my code to match the render directory style and installing different software specific to that.
This failed however because I reached a memory limit exceeded
error and it wouldn’t run my backend. I agree I haven’t optmised anything at all in the code base but this still stung. This is why you will have to run the backend file yourself.
- Training a custom model
The problem with training a custom model was just the dataset. I tried florence2
and YOLO
on different baseball datasets I found on roboflow that were specifically annotated for these models but the prediction confidence was very low and accuracy also was on the floor.
I tried applying custom edits to each frame to make it look similar to the training dataset (changing size, increasing contrast etc) and when that also didn’t work I realised that this is probably because the dataset contained vastly different enviornments and sometimes zoomed images of the ball. So, I had to find a dataset that was closer to an MLB scenario.
You can find all the test runs in the test
directory. After having tried 3 different models (and more tested directly on roboflow) I fixed on the one I am using now.
- The math for the camera correction
There are two angles I had to take care of, one the normal angle
and one the azimuthal angle
(the two angles in polar coordinates). I cound’t figure out a consistent way to account for the azimuthal angle but normal angle yes.
The exact calculations have been defined above, but it took three tries to get there. Most of the trouble seemed to come from the pixel to feet ratio and the angles to actually use the formulas I got. This is because for some test cases I ended up with 1000mph which is ridiculous and this was because the angles were so low that dividing by their sine ended up bloating the velocities.
# What I learned
This was the first time I tried to build something that involved vector databases, deep learning models and LLMs all in one product. I learnt along the course of building this about how models like YOLO work first hand, and I dived into the math behind deep learning (in fact, I ended up taking a course at my university on deep learning!). I also dived into the math behind embedding models and vector databases. I tried to code a custom embedding model using transformers in this project for the MLB future predictor. It was pretty fun (a little nerve racking too because there are just so many ways to do this, it was hard to choose).
Using multiple models, creating a pipeline that integrates them all and sketching that in a canva whiteboard (haha) was an interesting experience cause I felt like I am developing something that can ACTUALLY make a difference.
# What’s next for duxMLB
Ahh I have a lot of things to write here!
I probably didn’t mention above but I couldn’t work on the project for almost 1.5 weeks in the middle because of a family emergency, and hence some features that I was going to implement never made it through. It might be a stretch to say I could’ve achieved all this in 1.5 weeks but at least some of it, yes.
- Prediction from user’s practice videos
I figured that for a normal user, uploading homerun data like that is infeasible
largely because that requires a lot of focused approach and most people aren’t that way. So, I wanted to come up with a way for the users to enter a video of them pracitisng a swing, or a pitch and then compare that to a MLB swing or a pitch. I was going to do this by calculating
- How high they hold the bat from the waist
- Where the maximum velocity of their swing occurs at (it should be when it makes contact with the ball and not before)
- Their hand movement when they’re about to release the ball (the flick) and more…
This would be fun and I’ll do it regardless of the competition (once my exams are over for the semester)
- Figuring out game plans
Since we are already detecting pitchers and catchers, we can do similar things for other players and then based on where they are standing, figure out what sort of game is being played. There was one more reason I coudn’t implement this and that was because I had absolutely no idea what type of games exist!
I could research a lot but to get a feel for what the user really wants from the gameplay (do they want just the name? or the reason? or maybe who discovered it?) I would have be a long time fan myself, which wasn’t possible since I just started watching baseball!