From a Folder of Downloaded Papers to One Ordered Reading File — Locally
Short answer: If your research lives as a chaotic folder of downloaded PDFs, scanned book chapters and screenshots of figures, the painful part isn't the reading — it's assembling the pile into something you can actually read in order. PDF Insight is a local desktop app that does the assembly. Drop a topic folder on it, type the order you want in plain English ("oldest to newest, foundational papers first, then the recent ones"), and a local AI reads each file, classifies it, sorts it, and merges everything into one ordered PDF with a page citation back to each source — all on your own machine, nothing uploaded. On-device OCR makes scanned chapters and figure screenshots searchable so they get sorted too. It does not summarize the papers or format your citations — the thinking stays yours; it just clears the file-wrangling out of the way.
The literature pile every grad student knows
It always starts tidy. A new topic, a fresh folder, three or four key papers. Then a month goes by and the folder has become a landfill: forty PDFs named 1-s2.0-S0001457520301.pdf and (3) (1) final.pdf, a few scanned chapters you photographed in the library, screenshots of one figure you needed from a paywalled article, a government report, a preprint, and the supervisor's "you should really read this" email attachment.
You don't have a knowledge problem. The sources are right there. You have an ordering problem. When it's time to actually sit and read the literature for a chapter, a comprehensive exam, or a systematic review, you want it in one continuous document, in a sensible order — not forty browser tabs and a Finder window you have to keep re-sorting by hand.
The fix that everyone improvises is renaming files 01_, 02_, 03_ so they merge in the right sequence — and then redoing it the moment a new source shows up. That's the busywork a tool can take off your plate.
Drop the folder, describe the order, get one reading file
PDF Insight is deliberately narrow. It does one job well: it turns a folder of messy PDFs and images into one ordered, page-cited file, on your own machine. The workflow is short:
- Put the topic in one folder. Downloaded articles, exported preprints, scanned book chapters, screenshots of figures or tables, the odd government PDF — all of it, in no particular order. Messy is the point.
- Tell it the order in plain English. The way you'd explain it to a labmate, not a settings panel: "Sort by publication year, oldest first. Peer-reviewed articles first, then preprints, then reports. OCR anything scanned."
- Let the local AI read and sort. It reads each document, works out what it is and roughly when it's from, and on-device OCR (Tesseract, English + French) makes scans and screenshots legible so they land in the right place instead of a "mystery image" pile.
- Get one merged PDF with page citations. Every section traces back to its source file and page, so when you're drafting and need to find which paper made a claim, it's a lookup, not a scavenger hunt.
You go from an afternoon of dragging thumbnails to typing one sentence and waiting a few minutes.
Plain-English orderings you can steal
There's no query syntax to learn. You describe the order the way you'd say it out loud. A few that work well for research folders:
"Order by publication year, oldest to newest, so I can read the field as it developed."
"Group by theme: methods papers first, then empirical studies, then reviews. Within each group, newest first."
"Sort alphabetically by first author's last name — I want it to match my reference list."
"Put the three foundational papers first in the order I'd assign them, then everything else by year, and OCR the scanned chapters."
Find the ordering that matches how you think about the topic, and you can reuse the same directive every time you add sources — the reading file stays consistent as the project grows.
It reads scanned chapters and figure screenshots
Not all of research arrives as a clean digital PDF. You photograph a chapter from a library book that isn't available online. You screenshot a single figure from a paper because that's the one thing you needed. A colleague sends a scanned, slightly crooked copy of an old paper that predates digital archives.
Left alone, those images are dead weight in a merge — flat pictures with no searchable text, dumped wherever. PDF Insight runs OCR on them on your machine, so a scanned chapter or a figure screenshot becomes readable text the classifier can place correctly. The crooked 1987 scan ends up next to the other 1980s work, not stranded at the end of the file.
Reasonable expectation: OCR quality tracks image quality. A sharp screenshot or a flatbed scan reads cleanly; a blurry, poorly lit phone photo of a glossy page will be rougher. It's good enough to classify and sort the source — it is not a typesetting tool.
Why offline is the point for unpublished work
Researchers handle material that genuinely shouldn't leave the building: your own unpublished drafts, a manuscript under embargo, papers a reviewer sent you in confidence, interview transcripts with real people in them, data you're contractually bound to protect. Uploading all of that to some web tool just to put it in order is exactly the kind of quiet risk that turns into an ethics-board conversation.
You don't have to. PDF Insight runs 100% locally by default — the AI reads and classifies on your own computer, and nothing is uploaded in the local tier. It works fully offline; you could assemble your whole reading file on a plane with the Wi-Fi off. There's an optional paid cloud speed lane, but it's off unless you turn it on, and it's clearly labelled when you do. If "where does my data actually go" matters to you — and for unpublished work it should — see Where Your Files Actually Live: Cloud vs Local-First and Is It Safe to Use AI on Sensitive Documents?.
Before and after, at a glance
| Before (the renamed-files method) | After (one reading file) | |
|---|---|---|
| What you open to read | 40+ files across Downloads and tabs | One ordered, page-cited PDF |
| Order | Whatever the filenames force | By year, theme or author — your call |
| Scanned chapters & figure screenshots | Flat, unsearchable images | Made searchable by on-device OCR |
| Tracing a claim to a source | Scroll-and-pray | Follow the page citation in seconds |
| Unpublished / embargoed material | Uploaded to a web tool to sort | Never leaves your machine |
What this is NOT (so you don't expect the wrong thing)
Credibility cuts both ways, so let's be straight about the limits.
PDF Insight does not summarize your papers. It will not condense an abstract, extract the argument, build a literature matrix, or tell you what a study found. "Reading" here means it understands what each document is and where it belongs in the stack — not what it says. The actual reading and thinking are still your job, which is how it should be.
It is also not a reference manager. It does not replace Zotero, Mendeley or EndNote: it won't format citations in APA or MLA, generate BibTeX, or maintain a bibliography. The "page citations" it adds are internal pointers from the merged file back to each source page — useful for tracing, not for your works-cited list. And it's not cloud storage or a note-taking app.
What it is: a focused tool that takes the pile of PDFs and images you already have and turns it into one clean, ordered, cited reading file — so the assembly stops being the thing standing between you and the literature. If that's your bottleneck, it helps a lot. If you need something to do the reading for you, this isn't it — and we'd rather say so up front.
Turn your topic folder into one reading file
Point PDF Insight at a real, messy topic folder — PDFs, scanned chapters, figure screenshots and all — type the order you want in plain English, and get back one ordered, page-cited reading file built entirely on your own machine, with nothing uploaded. It's a 14-day free trial, no credit card. There's also a one-time Solo perpetual licence at $49 CAD for local-only use, so you can buy it once instead of renting it forever.
Download the free trial Buy Solo — $49 CAD onceFAQ
Does it summarize the papers or write my literature review?
No — and it's important to be clear about that. PDF Insight sorts, classifies and merges your PDFs and scans into one ordered, page-cited file. It does not summarize, paraphrase, extract arguments, or do any of the reading for you. The thinking stays yours; it just removes the file-wrangling between you and the reading.
Does it format citations or build a bibliography (APA, MLA, BibTeX)?
No. It is not a reference manager like Zotero, Mendeley or EndNote, and it does not generate citation strings or a bibliography. The "page citations" it adds are internal pointers — for each section of the merged file you get a reference back to the source file and page it came from, so you can trace any passage to its origin.
Can it read scanned book chapters and screenshots of figures?
Yes. On-device OCR (Tesseract, English and French) makes scanned chapters, photographed pages and figure screenshots searchable, so they get classified and sorted into the right place instead of being dumped as unreadable images.
Is my unpublished or embargoed material uploaded anywhere?
Not in the local tier, which is the default. Reading, OCR, classification and merging all run on your own machine and work fully offline, so drafts, embargoed papers and confidential sources never leave your computer. There's an optional paid cloud lane for extra speed, but it stays off unless you deliberately turn it on.
What does it cost?
Every plan starts with a 14-day free trial and no credit card. There's a one-time Solo perpetual licence at $49 CAD for local-only use, so a student can buy it once instead of paying a subscription forever.