Tamizh

Posted on May 5

Scrapebase + Permit.io: Web Scraping with API-First Authorization

#permitio #authorization #api #webdev

This is a submission for the Permit.io Authorization Challenge: Permissions Redefined

What I Built

I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.

In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.

Key Features

Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
API Key Authentication: Simple authentication using API keys
Role-Based Access Control: Permissions managed through Permit.io
Domain Blacklist System: Resource-level restrictions for sensitive domains
Text Processing: Basic and advanced text processing with role-based restrictions

How It Works

The core authentication and authorization flow:

User sends request with x-api-key header
permitAuth middleware intercepts the request
Middleware maps API key to user role (free_user, pro_user, or admin)
User is synced to Permit.io
Permission check runs against Permit.io cloud PDP
Request is allowed or denied based on policy decision

┌──────────┐    ┌───────────────┐    ┌────────────┐    ┌──────────────┐
│  Client  │───▶│ Scrapebase API│───▶│permitAuth  │───▶│  Permit.io   │
│          │◀───│               │◀───│ middleware │◀───│  Cloud PDP   │
└──────────┘    └───────────────┘    └────────────┘    └──────────────┘
     │                                                        ▲
     │                                                        │
     └────────────────────────────────────────────────────────┘
       Permission policies defined in Permit.io dashboard

Demo

You can test the API using the following endpoints:

# Test with free user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_free" \
  -d '{"url": "https://example.com"}'

# Test with admin user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_admin" \
  -d '{"url": "https://example.com", "advanced": true}'

Project Repo

0xtamizh / scrapebase-permit-IO

Scrapebase with Permit.io Authorization

A powerful web scraping API with fine-grained authorization controls powered by Permit.io. This project demonstrates how to implement sophisticated authorization patterns in a real-world API service.

Features

Tiered Access Control: Different permissions for Free, Pro, and Admin users
Resource-Based Authorization: Control access based on target domains
Rate Limiting: Tier-specific rate limits enforced through policies
Advanced Scraping Features: Premium capabilities restricted to Pro users
Real-time Policy Updates: Changes to permissions take effect immediately
Audit Logging: Track all authorization decisions

Quick Start

Clone the repository:

git clone https://github.com/yourusername/scrapebase-permit
cd scrapebase-permit

Install dependencies:

npm install

Set up environment variables:

cp .env.example .env

Edit .env with your Permit.io API key and other configurations:

PERMIT_API_KEY=your_permit_api_key
ADMIN_API_KEY=2025DEVChallenge_admin
USER_API_KEY=2025DEVChallenge_user

Start the development server:

npm run dev

Visit http://localhost:3000 to access the testing UI

Testing the Authorization Features

Test Credentials

Admin User:

Username: admin
API Key: 2025DEVChallenge_admin

Regular…

View on GitHub

My Journey

The Problem with Traditional Authorization

Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. When I started this project, I wanted to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.

I chose to build a web scraping service because it presents meaningful access control requirements:

Tiered service levels that mirror real-world SaaS subscription models
Administrative functions that require elevated permissions
Resource-based restrictions through a domain blacklist system

The Power of API-First Authorization

The key insight that drove this project was the separation of concerns: business logic should be distinct from authorization decisions. By using Permit.io, I was able to:

Define all permission policies in one place
Enforce consistent access control across all endpoints
Update policies without changing application code

The implementation was straightforward - here's the core middleware that powers the authorization flow:

// Map API key to user role
switch (apiKey) {
  case process.env.ADMIN_API_KEY:
    userKey = '2025DEVChallenge_admin';
    tier = 'admin';
    break;
  // ...other keys
}

// Sync user to Permit.io
await permit.api.syncUser({
  key: userKey,
  email: `${userKey}@scrapebase.xyz`,
  attributes: { tier, roles: [tier] }
});

// Check permission
const action = req.body.advanced ? 'scrape_advanced' : 'scrape_basic';
const permissionCheck = await permit.check(user.key, action, 'website');

if (!permissionCheck) {
  return res.status(403).json({
    success: false,
    error: 'Access denied by Permit.io'
  });
}

Challenges Faced

Cloud PDP Limitations

Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:

// This DIDN'T work with cloud PDP
const resource = {
  type: 'website',
  key: hostname,
  attributes: {
    is_blacklisted: isBlacklistedDomain
  }
};

const permissionCheck = await permit.check(user.key, action, resource);

The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:

// This works with cloud PDP
const permissionCheck = await permit.check(user.key, action, resourceType);

Role Assignment

Another challenge was ensuring roles were properly synchronized and recognized. The solution was two-fold:

Properly sync users with their role information
Manually configure role permissions in the Permit.io dashboard

Using Permit.io for Authorization

Setting up Permit.io involved these key steps:

Creating a project in the Permit.io dashboard
Defining resources (website), actions (scrape_basic, scrape_advanced), and roles (free_user, pro_user, admin)
Configuring the permission matrix in the dashboard
Integrating the Permit.io SDK into my application

Here's the role-based capability matrix I implemented:

Feature	Free User	Pro User	Admin
Basic Scraping	✅	✅	✅
Advanced Scraping	❌	✅	✅
Text Cleaning	✅	✅	✅
AI Summarization	❌	✅	✅
View Blacklist	✅	✅	✅
Manage Blacklist	❌	❌	✅
Access Blacklisted Domains	❌	❌	✅

Permission Enforcement

Permissions are enforced in two places:

The permitAuth middleware for API endpoints:

   const permissionCheck = await permit.check(user.key, action, 'website');
   if (!permissionCheck) {
     return res.status(403).json({ success: false, error: 'Access denied' });
   }

Directly in route handlers for specific features:

   // src/routes/summarize.ts
   if (summarize) {
     const userTier = req.user?.attributes?.tier;
     if (userTier !== 'pro_user' && userTier !== 'admin') {
       return res.status(403).json({
         success: false,
         error: 'Access denied',
         details: 'Text summarization is only available for Pro and Admin users'
       });
     }
   }

What I Learned

Building Scrapebase with Permit.io taught me how to:

Separate authorization concerns from business logic
Implement role-based access control with external policy management
Design a flexible permission system that doesn't require code changes to update policies

The advantages of this approach are clear:

Separation of concerns: Business logic remains focused on core functionality while authorization is handled externally
Adaptable policies: Permissions can be updated without code changes or redeployments
Consistent enforcement: Authorization decisions follow the same rules across all application endpoints
Improved security: Centralized policy management reduces the risk of inconsistent permission checks
Developer experience: Cleaner codebase with reduced authorization-related complexity

This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.

Future Improvements

With more time, I would:

Set up a local PDP to enable ABAC with resource attributes
Implement tenant isolation for multi-tenant support
Add UI components in the admin dashboard to view permission audit logs
Create more granular roles and permissions beyond the three tiers
Add a user management section to assign roles through the UI

Scrapebase demonstrates how modern SaaS apps can delegate complex authorization to a specialized service like Permit.io, allowing developers to focus on core features while maintaining robust access controls.

Top comments (1)

inAtom Labs • May 5

This is awesome! Really well done 👏