Document Parser Tool¶
The Document Parser tool reads and parses structured documents -- PDF, JSON, CSV, and plain text -- from file paths or base64-encoded content. It supports directory-scoped access control and file size limits.
Quick Reference¶
| Property | Value |
|---|---|
| Node name | tools/document-parser |
| Version | 0.1.0 |
| Library | pdf-parse (for PDF), built-in (for JSON, CSV, text) |
| Actions | parsePdf, parseJson, parseCsv, parseText |
| Tags | parser, pdf, csv, json, documents, tools, agentic |
Actions¶
parsePdf¶
Parse a PDF document and extract text content and metadata.
| Parameter | Type | Required | Description |
|---|---|---|---|
source |
string | Yes | File path or base64-encoded PDF |
sourceType |
enum | No | file or base64 (default: file) |
Returns:
{
"text": "extracted text content...",
"pages": 12,
"info": {
"title": "Document Title",
"author": "Author Name",
"pages": 12
}
}
parseJson¶
Parse a JSON file or base64-encoded JSON string.
| Parameter | Type | Required | Description |
|---|---|---|---|
source |
string | Yes | File path or base64-encoded JSON |
sourceType |
enum | No | file or base64 (default: file) |
Returns the parsed JSON value directly as data.
parseCsv¶
Parse a CSV file with configurable delimiter and header handling.
| Parameter | Type | Required | Description |
|---|---|---|---|
source |
string | Yes | File path or base64-encoded CSV |
sourceType |
enum | No | file or base64 (default: file) |
options.delimiter |
string | No | Column delimiter (default: ,) |
options.header |
boolean | No | Whether the first row is a header (default: true) |
Returns:
{
"headers": ["name", "value", "category"],
"rows": [
["alice", "10", "A"],
["bob", "20", "B"]
],
"rowCount": 2
}
When header is false, the headers field is null.
parseText¶
Parse a plain text file with optional line-based pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
source |
string | Yes | File path or base64-encoded text |
sourceType |
enum | No | file or base64 (default: file) |
options.offset |
number | No | Starting line number (default: 0) |
options.limit |
number | No | Maximum number of lines to return |
Returns:
Output Schema¶
| Field | Type | Description |
|---|---|---|
data |
object | Parsed document content (structure varies by action) |
format |
string | Format identifier: pdf, json, csv, or text |
success |
boolean | true on success |
Configuration Reference¶
| Property | Type | Default | Description |
|---|---|---|---|
allowedDirectories |
string[] | (none) | If set, file paths must be within these directories. Not enforced for base64 source type. |
maxFileSize |
integer | 50000000 (50 MB) |
Maximum file size in bytes. Files exceeding this limit are rejected. |
Source Types¶
The sourceType parameter controls how the source string is interpreted:
| Source Type | Behavior |
|---|---|
file |
Treat source as a file path. Read the file from disk. Subject to allowedDirectories and maxFileSize checks. |
base64 |
Treat source as base64-encoded content. Decode it in memory. Not subject to directory restrictions. |
Safety¶
Directory scoping¶
When sourceType is file, the file path is resolved and checked against allowedDirectories. If the path does not fall within an allowed directory, access is denied.
File size limits¶
Before reading a file, its size is checked against maxFileSize. Files exceeding the limit are rejected with a descriptive error.
Note
Directory restrictions and file size checks only apply to file source type. Base64 input bypasses directory checks (since there is no file path), but the decoded content is still limited by available memory.
Usage Example¶
import { documentParserNode } from '@flowforgejs/nodes';
const workflow = {
nodes: [
{
id: 'parse-report',
node: documentParserNode,
config: {
allowedDirectories: ['/data/reports'],
maxFileSize: 10_000_000,
},
input: {
action: 'parseCsv',
source: '/data/reports/quarterly-sales.csv',
sourceType: 'file',
options: {
delimiter: ',',
header: true,
},
},
},
],
};
Base64 Example¶
// Parse a CSV passed as base64 (e.g. from an API response)
{
action: 'parseCsv',
source: Buffer.from('name,score\nalice,95\nbob,87').toString('base64'),
sourceType: 'base64',
options: { delimiter: ',', header: true },
}
Tip
Use parseText with offset and limit for large log files. This avoids loading the entire file into the LLM context window and allows incremental processing.