# Cat Pereira

Full stack software engineer. I build things across the stack, from polished product UIs to cloud services and ML tooling. Lately I'm deep in computer vision and audio.

Site: https://catpereira.dev

## Projects

### Otterwatch

Live status board that watches sea otter cams, detects when an otter is on screen, and notifies you when one appears.
Computer vision detection pipeline (YOLO-World + CLIP) running over public otter cam feeds
Static React app on Vercel subscribing to Supabase Realtime for live sightings with thumbnails
Pushes sightings to both the website and Discord

- [GitHub](https://github.com/catherinepereira/otterwatch)

**Tech:** Computer Vision, Postgres, Python, React, TypeScript, Docker, Vite

### MH Nature Cam

Wildlife cam board that watches Morten Hilmer's 24/7 woodland livestream and records a clip around each animal sighting.
Open-vocabulary YOLOE detection running over the live feed, built on detstream
A detstream worker writes MP4 clips, peak JPEGs, and a SQLite index that the board reads and plays back
Embeds the live stream and lists recent sightings as they land

- [GitHub](https://github.com/catherinepereira/mh-nature-cam)

**Tech:** Computer Vision, Python

### explorable.cv

The home for my computer vision explorables!

- [Project Link](https://explorable.cv)
- [GitHub](https://github.com/catherinepereira/explorablecv)

**Tech:** Computer Vision, React, TypeScript, Vite

### detstream

Modular object detection framework for live video feeds. The engine behind Otterwatch.
Pulls from YouTube and RTSP sources, detects with YOLO-World, and tracks with hysteresis and cooldown to avoid alert spam
Pluggable sinks for console, Supabase, and Discord, configured with a single YAML file

- [GitHub](https://github.com/catherinepereira/detstream)
- [PyPI](https://pypi.org/project/detstream/)

**Tech:** Computer Vision, Python, Docker

### Sign Cards

Browser game for learning ASL fingerspelling, with themed levels and webcam flashcards.
Per-letter recognition from MediaPipe hand landmarks and an ONNX alphabet classifier
Motion letters (J, Z) scored by a CTC fingerspelling model, sharing the OpenHand pipeline
Themed word lists that unlock in order, with letter tiles that fill in as you sign

- [Project Link](https://sign-cards.vercel.app)
- [GitHub](https://github.com/catherinepereira/sign-cards)

**Tech:** Computer Vision, React, TypeScript, Vite

### airdraw

Draw on screen by pinching your fingers in the air, tracked through your webcam.
Hand tracking via MediaPipe with pinch-to-draw, running entirely in the browser with no backend
Color picker, eraser, brush sizing, background toggle, and per-stroke undo/redo
PNG export with a preview that swaps between camera, background, and transparent

- [Project Link](https://airdraw-cat.vercel.app)
- [GitHub](https://github.com/catherinepereira/airdraw)

**Tech:** Computer Vision, React, TypeScript, Vite

### OpenHand

Web app that converts American Sign Language fingerspelling to text in real-time using a webcam.
Per-frame letter detection (A-Z) using MLP and streaming phrase transcription using a CTC transformer
Learn screens with animated reference clips and optional text-to-speech output
Isolated word recognition against 250 common ASL signs (model only)

- [Project Link](https://openhand-asl.vercel.app)
- [GitHub (Frontend)](https://github.com/catherinepereira/openhand)
- [GitHub (Model)](https://github.com/catherinepereira/openhand-model)

**Tech:** Computer Vision, PyTorch, Python, React, TypeScript, Vite

### Captionaut

Video captioning app with automated transcription, speaker diarization, and inline caption editing.
Drag-and-drop upload with live transcription progress via Whisper and optional speaker diarization
Inline editor with seek-to-click, keyboard shortcuts, undo/redo, and vocal isolation via Demucs
Export to .srt/.vtt or render captions directly into video files, with auto-save and project thumbnails

- [GitHub](https://github.com/catherinepereira/captionaut)

**Tech:** Audio ML, FastAPI, Python, React, Docker, TypeScript, Vite

### prompt2dataset

CLI tool that generates labeled image datasets from plain-English descriptions using Claude AI.
Fetches and deduplicates images from multiple sources (DuckDuckGo, Wikimedia, iNaturalist, Openverse)
Interactive image review and validation, then fine-tune image classifiers with automatic learning rate detection

- [GitHub](https://github.com/catherinepereira/prompt2dataset)
- [PyPI](https://pypi.org/project/prompt2dataset/)

**Tech:** Computer Vision, LLMs, Python, PyTorch

### cli-cards

Renders the terminal-style usage cards shown across this site. Works as a CLI, an npm library, and a browser-based card editor.
Describe each card (title, colors, command lines) in a config file and it screenshots a styled terminal window to PNG using headless Chrome
The web editor builds a card with a live preview and downloads the PNG, reusing the same renderer as the CLI

- [Card Editor](https://cli-cards.vercel.app)
- [npm](https://www.npmjs.com/package/cli-cards)
- [GitHub (CLI)](https://github.com/catherinepereira/cli-cards)
- [GitHub (Editor)](https://github.com/catherinepereira/cli-cards-web)

**Tech:** TypeScript, React, Vite

### airship.top

A website I hosted to track live player count statistics for the multiplayer game platform Airship using GCP Cloud Run, Cloud Scheduler, Firebase Hosting, and Neon.

- [GitHub](https://github.com/catherinepereira/airship-top)

**Tech:** Google Cloud Platform, Nest.js, React, TypeScript, Docker, Postgres, Prisma

### Airship
*Easy Games (2025 - 2026)*

Worked on platform micro services, fully designing and implementing features across our stack from the database structure (Postgres, Prisma), to API design (NestJS, REST), and our user interfaces (TypeScript, React, Svelte).
Designed and built the Airship Developer Fund with fund distribution auditing, earnings breakdown, and payout tooling
Built a playtime tracking data pipeline using PubSub and BigQuery with Terraform-provisioned infrastructure
Built admin tooling for payouts and added user payout information forms and validation with Tipalti
Added multi-provider account linking with Firebase (Google, Apple, Steam) and automatic Steam friends detection and cross-platform friend request functionality using Steamworks API
Designed and built platform moderation tools and dashboards for internal moderation with auditing, along with game-scoped moderation tools for developers with a full permission system
Built a content moderation pipeline using GCP's moderation API, including confidence level and threshold tuning implemented across all visible text inputs
Added player count metrics using Prometheus with live display in Grafana
Added user-facing dashboards and webpages for features such as live queue times, organization member and data management, and account management

- [Project Link](https://airship.gg)

**Tech:** Google Cloud Platform, Nest.js, Postgres, React, TypeScript, Agones, BigQuery, BunnyCDN, CockroachDB, Docker, Firebase, Grafana, JWT, Kubernetes, Prisma, Prometheus, PubSub, Redis, Sentry, Steamworks, Svelte, Terraform

### dinnote

Package building off my existing work with the dinscribe package to add speech diarization into my audio pre-preprocessing pipeline.

- [GitHub](https://github.com/catherinepereira/dinnote)
- [PyPI](https://pypi.org/project/dinnote/)

**Tech:** Audio ML, Python

### dinscribe

Package streamlining multiple steps of applying audio pre-processing libraries to clean noisy audio and detect speech.
Designed to optimize speed and accuracy of OpenAI's Whisper transcription model.

- [GitHub](https://github.com/catherinepereira/dinscribe)
- [PyPI](https://pypi.org/project/dinscribe/)

**Tech:** Audio ML, Python

### doctape

CLI tool that converts large PDFs to Markdown by chopping them into page windows, running each through docling, and reassembling the result.
Per-window processing keeps memory bounded and makes long jobs resumable after an interruption
Optional EasyOCR pass for scanned pages and stylized cover art

- [GitHub](https://github.com/catherinepereira/doctape)
- [PyPI](https://pypi.org/project/doctape/)

**Tech:** Python

### F1 Pit Wall

Website hosting transcriptions of all driver radio communications from the 2025 F1 season.
Transcriptions were processed from raw driver radio audio using a custom audio processing pipeline, Whisper model prompting, and post-transcription refinement using LLMs.

- [Project Link](https://f1pitwall.vercel.app)
- [GitHub (Full Dataset)](https://github.com/catherinepereira/f1-2025-radio-transcriptions)

**Tech:** Audio ML, LLMs, Python, React, TypeScript

### F1Guessr

Web game inspired by GeoGuessr for F1 fans to guess the year and grand prix from a photo of the race.
Screencaps were automatically collected from F1 highlight videos available on YouTube and filtered using a custom fine-tuned vision model for quality.

- [Project Link](https://f1guessr.com)

**Tech:** Computer Vision, Python, PyTorch, React, TypeScript

### Roblox BedWars
*Easy Games (2022 - 2025)*

Led technical and creative development of over 10 purchasable and playable in-game characters
Contributed to a weekly content update schedule for over 3 years
Designed and implemented match reconnect support with asynchronous match performance finalization
Worked on the 'BedWars Creative' in-game UGC system featuring custom Lua scripting API and tooling
Designed and built the in-game tournament team creation, assignment, and event scheduling system

- [Roblox](https://bedwars.com)

**Tech:** Lua, React, Roblox Engine, TypeScript, Grafana, PlayFab, Redis, Snowflake

### Roblox Islands
*Easy Games (2022)*

Contributed to a weekly content update schedule
Led technical development of boss fights and interactable characters
Added various mini-games, events, and quests

- [Roblox](https://www.roblox.com/games/4872321990/)

**Tech:** Lua, React, Roblox Engine, TypeScript, Grafana

## Explorables

### Embeddings Playground

Audio embedding and dimensionality reduction visualizer using audio data sourced from the FreeMusicArchive music library.

- [Project Link](https://embeddings-playground-cat.vercel.app/)
- [GitHub (Frontend)](https://github.com/catherinepereira/embeddings-playground)
- [GitHub (Vectorization)](https://github.com/catherinepereira/embeddings-playground-scripts)

**Tech:** Audio ML, Python, React, TypeScript

### BPE Playground

Interactive step-through visualization of the Byte Pair Encoding algorithm as implemented in GPT-2's tokenizer.

- [Project Link](https://bpe-playground.vercel.app/)
- [GitHub (Frontend)](https://github.com/catherinepereira/bpe-visualizer)

**Tech:** LLMs, React, TypeScript

### CNN Playground

Browser-based interactive tool for visualizing how convolutional operations transform images in real-time.
Animated kernel sliding with per-position multiply-add breakdowns in single conv mode
Playground mode for adjusting kernel values, stride, and padding with live output updates
Multi-layer mode stacking up to 3 conv layers with ReLU and max pooling to observe feature map evolution

- [Project Link](https://explorable.cv/cnn-playground)
- [GitHub](https://github.com/catherinepereira/explorablecv)

**Tech:** Computer Vision, React, TypeScript

### CNN Visualizer

Web app that visualizes what a trained CNN model perceives at each layer when processing CIFAR-10 images.
Layer-by-layer feature map visualization as grayscale grids with per-class prediction probabilities

- [Project Link](https://explorable.cv/cnn-visualizer)
- [GitHub](https://github.com/catherinepereira/explorablecv)
- [GitHub (Model)](https://github.com/catherinepereira/cnn-from-scratch-model)

**Tech:** Computer Vision, Python, PyTorch, React, TypeScript

### CNN Architecture Comparison

Interactive web app that compares six major CNN architectures (LeNet-5, AlexNet, VGG-11, Inception-mini, ResNet-20, DenseNet-BC) side-by-side on the same image.
Parallel client-side model inference using ONNX Runtime Web
Real-time feature map visualization and per-architecture diagrams

- [Project Link](https://explorable.cv/cnn-architecture-comparison)
- [GitHub](https://github.com/catherinepereira/explorablecv)
- [GitHub (Model)](https://github.com/catherinepereira/cnn-architecture-comparison-model)

**Tech:** Computer Vision, Python, PyTorch, React, TypeScript

### CV Interpretability Explorer

Compares three image classifiers (Custom CNN, ResNet-18, ViT-S) trained on ImageNette, showing what each one looks at.
Grad-CAM, Score-CAM, saliency, LIME, and ViT attention rollout attributions side by side
UMAP projection of every model's penultimate-layer features

- [Project Link](https://explorable.cv/cv-interpretability)
- [GitHub](https://github.com/catherinepereira/explorablecv)
- [GitHub (Model)](https://github.com/catherinepereira/cv-interpretability-model)

**Tech:** Computer Vision, Python, PyTorch, React, TypeScript

### Transformer Playground

Runs a pretrained BERT-tiny in the browser and visualizes a transformer end to end on whatever sentence you type.
Step through tokenization, the Q/K/V projections, scaled dot-product scoring, attention, value aggregation, and the feed-forward block, each panel driven by the model's own tensors
Predicts masked words from the model's logits, then a causal-mask toggle shows how the same blocks become a GPT-style decoder

- [Project Link](https://transformer-playground.vercel.app/)
- [GitHub (Frontend)](https://github.com/catherinepereira/transformer-playground)
- [GitHub (Model)](https://github.com/catherinepereira/transformer-playground-model)

**Tech:** LLMs, Python, PyTorch, React, TypeScript

### ViT Playground

Visualizes how a Vision Transformer turns an image into a sequence of tokens, then runs ViT-tiny end to end in the browser.
Step through patch splitting, the prepended [CLS] token and position indices, and the linear projection into patch embeddings
Explore the encoder's attention head by head, roll it up into a single map of where the model looks, and watch ViT-tiny predict on a sample or your own image

- [Project Link](https://explorable.cv/vit-playground)
- [GitHub](https://github.com/catherinepereira/explorablecv)

**Tech:** Computer Vision, PyTorch, React, TypeScript