Extraction API

We charge usage of the model when extracting information from documents. The price depends on the number of input tokens (template, examples, and input document) and output tokens (JSON output).

Model	Input tokens	Output tokens	Token size (Image)	Token size (Text)
NuExtract 2.0 PRO	$1/M tokens	$5/M tokens	32x32 pixels	word or sub-word

If you need to process a large number of documents and need a lower price, do no hesitate to talk to us. Prices per tokens can be significantly lowered for large volumes, either by batching or by using a fine-tuned model.

Estimating Token Numbers

Text Tokens: In English, 1 word is about 1.3 tokens on average, which means a page typically contains 1000 tokens. Some languages have a higher average token count per word.

Image Tokens: With NuExtract 2.0 PRO, a token corresponds to a patch of 32x32 pixels. An A4 page rasterized at 115dpi corresponds to about 1500 tokens.

Here is a price-estimation chart for typical documents:

Modality	Size	Input Tokens	Input Price
Text	1 page	~1000	~$0.001
Text	100 pages	~100k	~$0.1
Image	1 A4 page at 115dpi	~1500	~$0.0015
Image	100 A4 pages at 115dpi	~150k	~$.15

NB: PDFs and other formatted documents are converted to images by default.

Private Platform

We offer the possibility to use the NuExtract Platform in a fully private way, i.e. deployed on a private cloud of your choice or even on-premises. There are three reasons why you may want to use a private platform:

Get provable (by-design) confidentiality/privacy for your documents.
Improve extraction performance via fine-tuning customization.
Reduce inference price to process a large number of documents (typically >10M pages per year).

Currently, you would need to talk to us to make this happen.

Cloud Marketplaces

Not yet available.