Extraction API

We charge usage of the model when extracting information from documents. The price depends on the number of input tokens (template, examples, and input document) and output tokens (JSON output).

Model Input tokens Output tokens Token size (Image) Token size (Text)
NuExtract 2.0 PRO $1/M tokens $5/M tokens 32x32 pixels word or sub-word

If you need to process a large number of documents and need a lower price, do no hesitate to talk to us. Prices per tokens can be significantly lowered for large volumes, either by batching or by using a fine-tuned model.

Estimating Token Numbers

Text Tokens: In English, 1 word is about 1.3 tokens on average, which means a page typically contains 1000 tokens. Some languages have a higher average token count per word.

Image Tokens: With NuExtract 2.0 PRO, a token corresponds to a patch of 32x32 pixels. An A4 page rasterized at 115dpi corresponds to about 1500 tokens.

Here is a price-estimation chart for typical documents:

Modality Size Input Tokens Input Price
Text 1 page ~1000 ~$0.001
Text 100 pages ~100k ~$0.1
Image 1 A4 page at 115dpi ~1500 ~$0.0015
Image 100 A4 pages at 115dpi ~150k ~$.15

NB: PDFs and other formatted documents are converted to images by default.

Private Platform

We offer the possibility to use the NuExtract Platform in a fully private way, i.e. deployed on a private cloud of your choice or even on-premises. There are three reasons why you may want to use a private platform:

  1. Get provable (by-design) confidentiality/privacy for your documents.
  2. Improve extraction performance via fine-tuning customization.
  3. Reduce inference price to process a large number of documents (typically >10M pages per year).

Currently, you would need to talk to us to make this happen.

Cloud Marketplaces

Not yet available.