How we evaluate document intelligence quality.
Nebula is evaluated on whether its output can support real LLM and RAG workflows — answering questions, reasoning over tables, interpreting charts, and preserving document hierarchy in Japanese and English business documents.
We measure what enterprises actually care about.
Our evaluation rubric is built around what enterprise teams need from document intelligence — usable output, real document coverage, and Japanese-strong quality.
Real documents, not toy benchmarks
We evaluate on the documents enterprises actually process — IR releases, board decks, statements, government forms — not curated test sets.
Japanese as a first-class workload
Japanese business and legal documents are part of our core evaluation set, not an afterthought tested on a handful of pages.
Output, not surface text
Surface character accuracy is a floor. We evaluate whether the output is actually useful to the LLM and RAG systems consuming it.
Validated before we publish
When a capability shows up on this page as validated, it has been evaluated on representative customer-facing documents — not just a single example.
Five things every Nebula evaluation answers.
Each criterion below is applied across customer-representative documents. Where output falls short, the gap is documented and routed back into the product roadmap.
Downstream LLM answerability
Can a model answer real business questions using the converted Markdown and structured JSON, without going back to the original PDF?
Markdown usability
Are headings, reading order, lists, footnotes, and document hierarchy preserved end-to-end?
Table & chart reasoning
Do tables and chart series support numerical and comparative reasoning after conversion?
Japanese business documents
Does the system handle Japanese legal, financial, and IR materials, including mixed JP/EN pages?
Enterprise document structure
Can slides, reports, statements, forms, and operational files remain useful after transformation?
Where Nebula has been validated, and where we are still expanding.
Where we have measured a capability against representative customer documents, we say so. Where the evaluation set is still small, we say that too.
Charts
Bar, line, pie, and multi-panel scientific figures returned as structured chart data.
Tables
Multi-level grouped headers, hierarchical row labels, merged cells, numerical fidelity.
Handwritten documents
Cursive English manuscripts and vertical Japanese 自筆原稿 — transcribed verbatim.
Forms
Multi-section forms with checkboxes, line numbers, and dependents grids preserved.
Japanese financial documents
Annual reports, IR releases, governance materials with Japanese business vocabulary.
Legal & regulatory PDFs
Long-form Japanese legal text with footnotes, citations, and nested headings preserved in reading order.
Common questions about evaluation.
How does Nebula evaluate document quality?
We evaluate on whether converted documents can support real downstream AI work — answering questions, reasoning over tables, interpreting charts, and preserving document hierarchy. Surface-level character matching is a baseline, not the goal.
Which document types have you validated?
Charts, tables, handwritten documents, forms, Japanese financial materials, and legal/regulatory PDFs are part of our validated set, with representative examples shown on the main Nebula page. We continue to expand the customer-representative corpora as new partners onboard.
How do you handle Japanese documents specifically?
Japanese business and legal documents are a core part of our evaluation rubric. We test mixed JP/EN layouts, bilingual tables, IR materials, and long-form regulatory PDFs. Japanese is a first-class workload, not a translated afterthought.
Can I send documents to be evaluated?
Yes. The fastest way is to try Nebula directly at nebula.ur-ai.net — sign in and run your own documents through it. We are especially interested in feedback on board decks, annual reports, regulatory filings, statements, expense files, and Japanese enterprise materials.
Try Nebula on your own documents.
The fastest way to evaluate Nebula on your own corpus is to try it directly. Sign in, upload a board deck, annual report, regulatory filing, statement, or any Japanese enterprise document, and see the output for yourself.