Skip to content

Document Parser Tool

The Document Parser tool reads and parses structured documents -- PDF, JSON, CSV, and plain text -- from file paths or base64-encoded content. It supports directory-scoped access control and file size limits.

Quick Reference

Property Value
Node name tools/document-parser
Version 0.1.0
Library pdf-parse (for PDF), built-in (for JSON, CSV, text)
Actions parsePdf, parseJson, parseCsv, parseText
Tags parser, pdf, csv, json, documents, tools, agentic

Actions

parsePdf

Parse a PDF document and extract text content and metadata.

Parameter Type Required Description
source string Yes File path or base64-encoded PDF
sourceType enum No file or base64 (default: file)

Returns:

{
  "text": "extracted text content...",
  "pages": 12,
  "info": {
    "title": "Document Title",
    "author": "Author Name",
    "pages": 12
  }
}

parseJson

Parse a JSON file or base64-encoded JSON string.

Parameter Type Required Description
source string Yes File path or base64-encoded JSON
sourceType enum No file or base64 (default: file)

Returns the parsed JSON value directly as data.

parseCsv

Parse a CSV file with configurable delimiter and header handling.

Parameter Type Required Description
source string Yes File path or base64-encoded CSV
sourceType enum No file or base64 (default: file)
options.delimiter string No Column delimiter (default: ,)
options.header boolean No Whether the first row is a header (default: true)

Returns:

{
  "headers": ["name", "value", "category"],
  "rows": [
    ["alice", "10", "A"],
    ["bob", "20", "B"]
  ],
  "rowCount": 2
}

When header is false, the headers field is null.

parseText

Parse a plain text file with optional line-based pagination.

Parameter Type Required Description
source string Yes File path or base64-encoded text
sourceType enum No file or base64 (default: file)
options.offset number No Starting line number (default: 0)
options.limit number No Maximum number of lines to return

Returns:

{
  "lines": ["line 1", "line 2", "..."],
  "lineCount": 1000,
  "offset": 0,
  "limit": 1000
}

Output Schema

Field Type Description
data object Parsed document content (structure varies by action)
format string Format identifier: pdf, json, csv, or text
success boolean true on success

Configuration Reference

Property Type Default Description
allowedDirectories string[] (none) If set, file paths must be within these directories. Not enforced for base64 source type.
maxFileSize integer 50000000 (50 MB) Maximum file size in bytes. Files exceeding this limit are rejected.

Source Types

The sourceType parameter controls how the source string is interpreted:

Source Type Behavior
file Treat source as a file path. Read the file from disk. Subject to allowedDirectories and maxFileSize checks.
base64 Treat source as base64-encoded content. Decode it in memory. Not subject to directory restrictions.

Safety

Directory scoping

When sourceType is file, the file path is resolved and checked against allowedDirectories. If the path does not fall within an allowed directory, access is denied.

File size limits

Before reading a file, its size is checked against maxFileSize. Files exceeding the limit are rejected with a descriptive error.

Note

Directory restrictions and file size checks only apply to file source type. Base64 input bypasses directory checks (since there is no file path), but the decoded content is still limited by available memory.

Usage Example

import { documentParserNode } from '@flowforgejs/nodes';

const workflow = {
  nodes: [
    {
      id: 'parse-report',
      node: documentParserNode,
      config: {
        allowedDirectories: ['/data/reports'],
        maxFileSize: 10_000_000,
      },
      input: {
        action: 'parseCsv',
        source: '/data/reports/quarterly-sales.csv',
        sourceType: 'file',
        options: {
          delimiter: ',',
          header: true,
        },
      },
    },
  ],
};

Base64 Example

// Parse a CSV passed as base64 (e.g. from an API response)
{
  action: 'parseCsv',
  source: Buffer.from('name,score\nalice,95\nbob,87').toString('base64'),
  sourceType: 'base64',
  options: { delimiter: ',', header: true },
}

Tip

Use parseText with offset and limit for large log files. This avoids loading the entire file into the LLM context window and allows incremental processing.