Joe Notes - Joe Barrow

# About Me I graduated with my PhD in Computer Science / NLP from the [University of Maryland]() in 2022. For my dissertation, I did research on [building NLP tools for collection browsing]. My advisors were [Philip Resnik]() and [Doug Oard](). Since then, I have worked at: - [Adobe Research](https://adobe.com), in their Document Intelligence Lab - [Pattern Data](https://patterndata.ai), leading their ML/NLP work I am __not__ Bengals quarterback [Joe Burrow](), for those of you who landed here Googling him. 😉 # Penpusher My hobby project is [penpusher.app](https://penpusher.app), an AI-powered form-filling service. Email me, DM me, or sign up for the waitlist if you want beta access. I also built [detect.penpusher.app](https://detect.penpusher.app), which automatically converts PDFs into fillable forms. # Blog Posts ## Multimodal Models and Document AI A series of posts in which I dig deep into the current state of document AI: summarizing model releases, datasets, interesting papers, and running a few experiments of my own. **[[1. Tracking Large Multimodal Model (LMM) Announcements and Metrics (2023-2024)]]** _December 2024_ `🚧 under construction` Looking at the landscape of document tasks that model labs publish benchmark results on. **[[2. What can we learn about LMMs from Document Question Answering?]]** _January 2025_ `🚧 under construction` An in-depth analysis of the DocVQA dataset. What is it, how was it constructed, and what can we learn from it? **[[3. What can we learn about LMMs from Charts?]]** _February 2025_ `🚧 under construction` A look at chart datasets and LMM performance on them. What are the limitations, why aren't current models *there* yet? ## Vector Databases and Retrieval **[[Don't Filter Your Vector Database, Partition It]]** *April 21, 2025* Filtering vector databases is inefficient, and trades off speed for recall. You should instead partition your vector database into units that make sense, and store one durable object per unit. It will scale better, be cheaper to run, and give you better results. **[[Introduction to `TinyHNSW`]]** *December, 2023* **A tutorial on building your own (reasonably efficient) vector database**. A series about how HNSW vector search works, from zero to a working approximate nearest neighbors (ANN) library in Python. Minimal dependencies (`numpy` and `networkx`, plus `matplotlib` for visualization). **All the code is at**: https://github.com/jbarrow/tinyhnsw (**[Code](https://github.com/jbarrow/tinyhnsw)**) ## Paper Notes and Miscellaneous Various paper notes, plus some explainers, guides, and general thoughts. **[[Be Careful Reporting (or Interpreting) Averaged Benchmarks]]** *April 25, 2025* Reporting averaged benchmark metrics is flawed when multiple benchmarks contain the exact same instances (e.g. OCRBench/DocVQA). **[[Horseshoes (and Hand Grenades) - LLM Localization is not Close, but not Close Enough]]** *April 23, 2025* Large Multimodal Models (LMMs) can now output bounding boxes when given images as inputs. But are the results on documents good enough for real-world use? **[[Speeding up Visual Question Answering 10x (Faster ANLS)]]** *January 8th, 2025* An explanation of Average Normalized Levenshtein Similarity (ANLS), which is used as a metric for evaluating document question answering (e.g. [[2. What can we learn about LMMs from Document Question Answering?|DocVQA]]). By swapping how edit distance is computed, we can get a 10x speedup and still pass all the tests. **[[Google Gemini 102 - Advanced Structured Outputs]]** _December 26th, 2024_ A deeper look at Gemini's structured outputs: required fields, the subset of the OpenAPI spec it supports, `$refs`, `$defs`, and using it with `openai`, oh my! **[[Google Gemini 101 - Object Detection with Vision and Structured Outputs]]** _December 19th, 2024_ The missing manual for getting up and running quickly with Gemini 2.0 Flash: making calls with vision mode, getting structured returns, and getting/showing bounding boxes. We build a neat little zero-shot tea set detector. (**[Code](https://gist.github.com/jbarrow/c35279cbed578eeba4e1253dc6907c8c)**) # Publications ``` working on transferring publications over from: https://scholar.google.com/citations?user=A0mvKlgAAAAJ&hl=en ``` # Contact Me Feel free to reach out to me at: ``` joseph.d.barrow [at] gmail [dot] com ``` If you **include the URL of my blog in the subject line**, I'll look at the email.