Files
Zachary D. Rowitsch 931f17b70e
All checks were successful
ci / fast (linux) (push) Successful in 6m46s
Add BMAD framework, planning artifacts, and architecture decision document
Install BMAD workflow framework with agent commands and templates.
Create product brief, PRD, project context, and architecture decision
document covering networking/persistence strategy, JS engine evolution
path, threading model, web_api scaling, system integration, and
tab/process model. Add generated project documentation (architecture
overview, component inventory, development guide, source tree analysis).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:13:06 -04:00

457 lines
13 KiB
Markdown

# File Utilities
## Principle
Read and validate files (CSV, XLSX, PDF, ZIP) with automatic parsing, type-safe results, and download handling. Simplify file operations in Playwright tests with built-in format support and validation helpers.
## Rationale
Testing file operations in Playwright requires boilerplate:
- Manual download handling
- External parsing libraries for each format
- No validation helpers
- Type-unsafe results
- Repetitive path handling
The `file-utils` module provides:
- **Auto-parsing**: CSV, XLSX, PDF, ZIP automatically parsed
- **Download handling**: Single function for UI or API-triggered downloads
- **Type-safe**: TypeScript interfaces for parsed results
- **Validation helpers**: Row count, header checks, content validation
- **Format support**: Multiple sheet support (XLSX), text extraction (PDF), archive extraction (ZIP)
## Why Use This Instead of Vanilla Playwright?
| Vanilla Playwright | File Utils |
| ------------------------------------------- | ------------------------------------------------ |
| ~80 lines per CSV flow (download + parse) | ~10 lines end-to-end |
| Manual event orchestration for downloads | Encapsulated in `handleDownload()` |
| Manual path handling and `saveAs` | Returns a ready-to-use file path |
| Manual existence checks and error handling | Centralized in one place via utility patterns |
| Manual CSV parsing config (headers, typing) | `readCSV()` returns `{ data, headers }` directly |
## Pattern Examples
### Example 1: UI-Triggered CSV Download
**Context**: User clicks button, CSV downloads, validate contents.
**Implementation**:
```typescript
import { handleDownload, readCSV } from '@seontechnologies/playwright-utils/file-utils';
import path from 'node:path';
const DOWNLOAD_DIR = path.join(__dirname, '../downloads');
test('should download and validate CSV', async ({ page }) => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.getByTestId('download-button-text/csv').click(),
});
const csvResult = await readCSV({ filePath: downloadPath });
// Access parsed data and headers
const { data, headers } = csvResult.content;
expect(headers).toEqual(['ID', 'Name', 'Email']);
expect(data[0]).toMatchObject({
ID: expect.any(String),
Name: expect.any(String),
Email: expect.any(String),
});
});
```
**Key Points**:
- `handleDownload` waits for download, returns file path
- `readCSV` auto-parses to `{ headers, data }`
- Type-safe access to parsed content
- Clean up downloads in `afterEach`
### Example 2: XLSX with Multiple Sheets
**Context**: Excel file with multiple sheets (e.g., Summary, Details, Errors).
**Implementation**:
```typescript
import { readXLSX } from '@seontechnologies/playwright-utils/file-utils';
test('should read multi-sheet XLSX', async () => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.click('[data-testid="export-xlsx"]'),
});
const xlsxResult = await readXLSX({ filePath: downloadPath });
// Verify worksheet structure
expect(xlsxResult.content.worksheets.length).toBeGreaterThan(0);
const worksheet = xlsxResult.content.worksheets[0];
expect(worksheet).toBeDefined();
expect(worksheet).toHaveProperty('name');
// Access sheet data
const sheetData = worksheet?.data;
expect(Array.isArray(sheetData)).toBe(true);
// Use type assertion for type safety
const firstRow = sheetData![0] as Record<string, unknown>;
expect(firstRow).toHaveProperty('id');
});
```
**Key Points**:
- `worksheets` array with `name` and `data` properties
- Access sheets by name
- Each sheet has its own headers and data
- Type-safe sheet iteration
### Example 3: PDF Text Extraction
**Context**: Validate PDF report contains expected content.
**Implementation**:
```typescript
import { readPDF } from '@seontechnologies/playwright-utils/file-utils';
test('should validate PDF report', async () => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.getByTestId('download-button-Text-based PDF Document').click(),
});
const pdfResult = await readPDF({ filePath: downloadPath });
// content is extracted text from all pages
expect(pdfResult.pagesCount).toBe(1);
expect(pdfResult.fileName).toContain('.pdf');
expect(pdfResult.content).toContain('All you need is the free Adobe Acrobat Reader');
});
```
**PDF Reader Options:**
```typescript
const result = await readPDF({
filePath: '/path/to/document.pdf',
mergePages: false, // Keep pages separate (default: true)
debug: true, // Enable debug logging
maxPages: 10, // Limit processing to first 10 pages
});
```
**Important Limitation - Vector-based PDFs:**
Text extraction may fail for PDFs that store text as vector graphics (e.g., those generated by jsPDF):
```typescript
// Vector-based PDF example (extraction fails gracefully)
const pdfResult = await readPDF({ filePath: downloadPath });
expect(pdfResult.pagesCount).toBe(1);
expect(pdfResult.info.extractionNotes).toContain('Text extraction from vector-based PDFs is not supported.');
```
Such PDFs will have:
- `textExtractionSuccess: false`
- `isVectorBased: true`
- Explanatory message in `extractionNotes`
### Example 4: ZIP Archive Validation
**Context**: Validate ZIP contains expected files and extract specific file.
**Implementation**:
```typescript
import { readZIP } from '@seontechnologies/playwright-utils/file-utils';
test('should validate ZIP archive', async () => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.click('[data-testid="download-backup"]'),
});
const zipResult = await readZIP({ filePath: downloadPath });
// Check file list
expect(Array.isArray(zipResult.content.entries)).toBe(true);
expect(zipResult.content.entries).toContain('Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv');
// Extract specific file
const targetFile = 'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv';
const zipWithExtraction = await readZIP({
filePath: downloadPath,
fileToExtract: targetFile,
});
// Access extracted file buffer
const extractedFiles = zipWithExtraction.content.extractedFiles || {};
const fileBuffer = extractedFiles[targetFile];
expect(fileBuffer).toBeInstanceOf(Buffer);
expect(fileBuffer?.length).toBeGreaterThan(0);
});
```
**Key Points**:
- `content.entries` lists all files in archive
- `fileToExtract` extracts specific files to Buffer
- Validate archive structure
- Read and parse individual files from ZIP
### Example 5: API-Triggered Download
**Context**: API endpoint returns file download (not UI click).
**Implementation**:
```typescript
test('should download via API', async ({ page, request }) => {
const downloadPath = await handleDownload({
page, // Still need page for download events
downloadDir: DOWNLOAD_DIR,
trigger: async () => {
const response = await request.get('/api/export/csv', {
headers: { Authorization: 'Bearer token' },
});
if (!response.ok()) {
throw new Error(`Export failed: ${response.status()}`);
}
},
});
const { content } = await readCSV({ filePath: downloadPath });
expect(content.data).toHaveLength(100);
});
```
**Key Points**:
- `trigger` can be async API call
- API must return `Content-Disposition` header
- Still need `page` for download events
- Works with authenticated endpoints
### Example 6: Reading CSV from Buffer (ZIP extraction)
**Context**: Read CSV content directly from a Buffer (e.g., extracted from ZIP).
**Implementation**:
```typescript
// Read from a Buffer (e.g., extracted from a ZIP)
const zipResult = await readZIP({
filePath: 'archive.zip',
fileToExtract: 'data.csv',
});
const fileBuffer = zipResult.content.extractedFiles?.['data.csv'];
const csvFromBuffer = await readCSV({ content: fileBuffer });
// Read from a string
const csvString = 'name,age\nJohn,30\nJane,25';
const csvFromString = await readCSV({ content: csvString });
const { data, headers } = csvFromString.content;
expect(headers).toContain('name');
expect(headers).toContain('age');
```
## API Reference
### CSV Reader Options
| Option | Type | Default | Description |
| -------------- | ------------------ | -------- | -------------------------------------- |
| `filePath` | `string` | - | Path to CSV file (mutually exclusive) |
| `content` | `string \| Buffer` | - | Direct content (mutually exclusive) |
| `delimiter` | `string \| 'auto'` | `','` | Value separator, auto-detect if 'auto' |
| `encoding` | `string` | `'utf8'` | File encoding |
| `parseHeaders` | `boolean` | `true` | Use first row as headers |
| `trim` | `boolean` | `true` | Trim whitespace from values |
### XLSX Reader Options
| Option | Type | Description |
| ----------- | -------- | ------------------------------ |
| `filePath` | `string` | Path to XLSX file |
| `sheetName` | `string` | Name of sheet to set as active |
### PDF Reader Options
| Option | Type | Default | Description |
| ------------ | --------- | ------- | --------------------------- |
| `filePath` | `string` | - | Path to PDF file (required) |
| `mergePages` | `boolean` | `true` | Merge text from all pages |
| `maxPages` | `number` | - | Maximum pages to extract |
| `debug` | `boolean` | `false` | Enable debug logging |
### ZIP Reader Options
| Option | Type | Description |
| --------------- | -------- | ---------------------------------- |
| `filePath` | `string` | Path to ZIP file |
| `fileToExtract` | `string` | Specific file to extract to Buffer |
### Return Values
#### CSV Reader Return Value
```typescript
{
content: {
data: Array<Array<string | number>>, // Parsed rows (excludes header row if parseHeaders: true)
headers: string[] | null // Column headers (null if parseHeaders: false)
}
}
```
#### XLSX Reader Return Value
```typescript
{
content: {
worksheets: Array<{
name: string; // Sheet name
rows: Array<Array<any>>; // All rows including headers
headers?: string[]; // First row as headers (if present)
}>;
}
}
```
#### PDF Reader Return Value
```typescript
{
content: string, // Extracted text (merged or per-page based on mergePages)
pagesCount: number, // Total pages in PDF
fileName?: string, // Original filename if available
info?: Record<string, any> // PDF metadata (author, title, etc.)
}
```
> **Note**: When `mergePages: false`, `content` is an array of strings (one per page). When `maxPages` is set, only that many pages are extracted.
#### ZIP Reader Return Value
```typescript
{
content: {
entries: Array<{
name: string, // File/directory path within ZIP
size: number, // Uncompressed size in bytes
isDirectory: boolean // True for directories
}>,
extractedFiles: Record<string, Buffer | string> // Extracted file contents by path
}
}
```
> **Note**: When `fileToExtract` is specified, only that file appears in `extractedFiles`.
## Download Cleanup Pattern
```typescript
test.afterEach(async () => {
// Clean up downloaded files
await fs.remove(DOWNLOAD_DIR);
});
```
## Comparison with Vanilla Playwright
Vanilla Playwright (real test) snippet:
```typescript
// ~80 lines of boilerplate!
const [download] = await Promise.all([page.waitForEvent('download'), page.getByTestId('download-button-CSV Export').click()]);
const failure = await download.failure();
expect(failure).toBeNull();
const filePath = testInfo.outputPath(download.suggestedFilename());
await download.saveAs(filePath);
await expect
.poll(
async () => {
try {
await fs.access(filePath);
return true;
} catch {
return false;
}
},
{ timeout: 5000, intervals: [100, 200, 500] },
)
.toBe(true);
const csvContent = await fs.readFile(filePath, 'utf-8');
const parseResult = parse(csvContent, {
header: true,
skipEmptyLines: true,
dynamicTyping: true,
transformHeader: (header: string) => header.trim(),
});
if (parseResult.errors.length > 0) {
throw new Error(`CSV parsing errors: ${JSON.stringify(parseResult.errors)}`);
}
const data = parseResult.data as Array<Record<string, unknown>>;
const headers = parseResult.meta.fields || [];
```
With File Utils, the same flow becomes:
```typescript
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.getByTestId('download-button-text/csv').click(),
});
const { data, headers } = (await readCSV({ filePath: downloadPath })).content;
```
## Related Fragments
- `overview.md` - Installation and imports
- `api-request.md` - API-triggered downloads
- `recurse.md` - Poll for file generation completion
## Anti-Patterns
**DON'T leave downloads in place:**
```typescript
test('creates file', async () => {
await handleDownload({ ... })
// File left in downloads folder
})
```
**DO clean up after tests:**
```typescript
test.afterEach(async () => {
await fs.remove(DOWNLOAD_DIR);
});
```