DEV Community

Cover image for Scrapebase + Permit.io: Web Scraping with API-First Authorization
Tamizh
Tamizh

Posted on

Scrapebase + Permit.io: Web Scraping with API-First Authorization

This is a submission for the Permit.io Authorization Challenge: Permissions Redefined

What I Built

I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.

In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.

Key Features

  • Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
  • API Key Authentication: Simple authentication using API keys
  • Role-Based Access Control: Permissions managed through Permit.io
  • Domain Blacklist System: Resource-level restrictions for sensitive domains
  • Text Processing: Basic and advanced text processing with role-based restrictions

How It Works

The core authentication and authorization flow:

  1. User sends request with x-api-key header
  2. permitAuth middleware intercepts the request
  3. Middleware maps API key to user role (free_user, pro_user, or admin)
  4. User is synced to Permit.io
  5. Permission check runs against Permit.io cloud PDP
  6. Request is allowed or denied based on policy decision
┌──────────┐    ┌───────────────┐    ┌────────────┐    ┌──────────────┐
│  Client  │───▶│ Scrapebase API│───▶│permitAuth  │───▶│  Permit.io   │
│          │◀───│               │◀───│ middleware │◀───│  Cloud PDP   │
└──────────┘    └───────────────┘    └────────────┘    └──────────────┘
     │                                                        ▲
     │                                                        │
     └────────────────────────────────────────────────────────┘
       Permission policies defined in Permit.io dashboard
Enter fullscreen mode Exit fullscreen mode

Demo

Scrapebase Demo

You can test the API using the following endpoints:

# Test with free user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_free" \
  -d '{"url": "https://example.com"}'

# Test with admin user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_admin" \
  -d '{"url": "https://example.com", "advanced": true}'
Enter fullscreen mode Exit fullscreen mode

Project Repo

Scrapebase with Permit.io Authorization

A powerful web scraping API with fine-grained authorization controls powered by Permit.io. This project demonstrates how to implement sophisticated authorization patterns in a real-world API service.

Features

  • Tiered Access Control: Different permissions for Free, Pro, and Admin users
  • Resource-Based Authorization: Control access based on target domains
  • Rate Limiting: Tier-specific rate limits enforced through policies
  • Advanced Scraping Features: Premium capabilities restricted to Pro users
  • Real-time Policy Updates: Changes to permissions take effect immediately
  • Audit Logging: Track all authorization decisions

Quick Start

  1. Clone the repository:
git clone https://github.com/yourusername/scrapebase-permit
cd scrapebase-permit
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
npm install
Enter fullscreen mode Exit fullscreen mode
  1. Set up environment variables:
cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Edit .env with your Permit.io API key and other configurations:

PERMIT_API_KEY=your_permit_api_key
ADMIN_API_KEY=2025DEVChallenge_admin
USER_API_KEY=2025DEVChallenge_user
  1. Start the development server:
npm run dev
Enter fullscreen mode Exit fullscreen mode
  1. Visit http://localhost:3000 to access the testing UI

Testing the Authorization Features

Test Credentials

Admin User:

  • Username: admin
  • API Key: 2025DEVChallenge_admin

Regular




My Journey

The Problem with Traditional Authorization

Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. When I started this project, I wanted to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.

I chose to build a web scraping service because it presents meaningful access control requirements:

  1. Tiered service levels that mirror real-world SaaS subscription models
  2. Administrative functions that require elevated permissions
  3. Resource-based restrictions through a domain blacklist system

The Power of API-First Authorization

The key insight that drove this project was the separation of concerns: business logic should be distinct from authorization decisions. By using Permit.io, I was able to:

  1. Define all permission policies in one place
  2. Enforce consistent access control across all endpoints
  3. Update policies without changing application code

The implementation was straightforward - here's the core middleware that powers the authorization flow:

// Map API key to user role
switch (apiKey) {
  case process.env.ADMIN_API_KEY:
    userKey = '2025DEVChallenge_admin';
    tier = 'admin';
    break;
  // ...other keys
}

// Sync user to Permit.io
await permit.api.syncUser({
  key: userKey,
  email: `${userKey}@scrapebase.xyz`,
  attributes: { tier, roles: [tier] }
});

// Check permission
const action = req.body.advanced ? 'scrape_advanced' : 'scrape_basic';
const permissionCheck = await permit.check(user.key, action, 'website');

if (!permissionCheck) {
  return res.status(403).json({
    success: false,
    error: 'Access denied by Permit.io'
  });
}
Enter fullscreen mode Exit fullscreen mode

Challenges Faced

Cloud PDP Limitations

Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:

// This DIDN'T work with cloud PDP
const resource = {
  type: 'website',
  key: hostname,
  attributes: {
    is_blacklisted: isBlacklistedDomain
  }
};

const permissionCheck = await permit.check(user.key, action, resource);
Enter fullscreen mode Exit fullscreen mode

The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:

// This works with cloud PDP
const permissionCheck = await permit.check(user.key, action, resourceType);
Enter fullscreen mode Exit fullscreen mode

Role Assignment

Another challenge was ensuring roles were properly synchronized and recognized. The solution was two-fold:

  1. Properly sync users with their role information
  2. Manually configure role permissions in the Permit.io dashboard

Using Permit.io for Authorization

Setting up Permit.io involved these key steps:

  1. Creating a project in the Permit.io dashboard
  2. Defining resources (website), actions (scrape_basic, scrape_advanced), and roles (free_user, pro_user, admin)
  3. Configuring the permission matrix in the dashboard
  4. Integrating the Permit.io SDK into my application

Here's the role-based capability matrix I implemented:

Feature Free User Pro User Admin
Basic Scraping
Advanced Scraping
Text Cleaning
AI Summarization
View Blacklist
Manage Blacklist
Access Blacklisted Domains

Permission Enforcement

Permissions are enforced in two places:

  1. The permitAuth middleware for API endpoints:
   const permissionCheck = await permit.check(user.key, action, 'website');
   if (!permissionCheck) {
     return res.status(403).json({ success: false, error: 'Access denied' });
   }
Enter fullscreen mode Exit fullscreen mode
  1. Directly in route handlers for specific features:
   // src/routes/summarize.ts
   if (summarize) {
     const userTier = req.user?.attributes?.tier;
     if (userTier !== 'pro_user' && userTier !== 'admin') {
       return res.status(403).json({
         success: false,
         error: 'Access denied',
         details: 'Text summarization is only available for Pro and Admin users'
       });
     }
   }
Enter fullscreen mode Exit fullscreen mode

What I Learned

Building Scrapebase with Permit.io taught me how to:

  1. Separate authorization concerns from business logic
  2. Implement role-based access control with external policy management
  3. Design a flexible permission system that doesn't require code changes to update policies

The advantages of this approach are clear:

  1. Separation of concerns: Business logic remains focused on core functionality while authorization is handled externally
  2. Adaptable policies: Permissions can be updated without code changes or redeployments
  3. Consistent enforcement: Authorization decisions follow the same rules across all application endpoints
  4. Improved security: Centralized policy management reduces the risk of inconsistent permission checks
  5. Developer experience: Cleaner codebase with reduced authorization-related complexity

This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.

Future Improvements

With more time, I would:

  1. Set up a local PDP to enable ABAC with resource attributes
  2. Implement tenant isolation for multi-tenant support
  3. Add UI components in the admin dashboard to view permission audit logs
  4. Create more granular roles and permissions beyond the three tiers
  5. Add a user management section to assign roles through the UI

Scrapebase demonstrates how modern SaaS apps can delegate complex authorization to a specialized service like Permit.io, allowing developers to focus on core features while maintaining robust access controls.

Top comments (1)

Collapse
 
inatom_labs_6568f3125f77e profile image
inAtom Labs

This is awesome! Really well done 👏