0% found this document useful (0 votes)
60 views

Website Scraping With Portia On Linode A Step-By-Step Guide To Persistent Data and File Transfer

1. To scrape a website using Portia on a Linode server, start the Portia Docker container and create a Docker volume to ensure data persists even if the container stops. 2. Access the Portia container and start scraping a website using the portiacrawl command. 3. Copy the output file from the Docker container to the Linode server, then transfer it from the server to a local machine using scp. 4. Monitor the growth of the output file size using commands like ls and watch inside the Docker container.

Uploaded by

45zmsp6h2g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Website Scraping With Portia On Linode A Step-By-Step Guide To Persistent Data and File Transfer

1. To scrape a website using Portia on a Linode server, start the Portia Docker container and create a Docker volume to ensure data persists even if the container stops. 2. Access the Portia container and start scraping a website using the portiacrawl command. 3. Copy the output file from the Docker container to the Linode server, then transfer it from the server to a local machine using scp. 4. Monitor the growth of the output file size using commands like ls and watch inside the Docker container.

Uploaded by

45zmsp6h2g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Certainly!

Here's a summary of the steps and commands used, focusing on how to


scrape a website using Portia on a Linode server, how to save your work even if the
container stops, how to transfer the final file from the Docker volume to the server
and then to a macOS or Windows machine, and how to monitor the file growth:

Starting Portia for Web Scraping:


1. SSH into Linode Server:
Access your Linode server via SSH:

ssh username@server_ip

2. Run Portia Docker Container:


Start the Portia container. If a volume is not set up yet, you can run:

docker run -d -p 9001:9001 scrapinghub/portia

This command runs Portia in detached mode and maps port 9001 of the
container to port 9001 of the host.

Ensuring Data Persistence:


1. Create a Docker Volume:
To save work even if the container stops, create a Docker volume:

docker volume create portia_data

2. Run Container with Volume:


Start the container with the volume attached:

docker run -d -v portia_data:/app/data -p 9001:9001


scrapinghub/portia

This mounts the portia_data volume to /app/data in the container.


Scraping a Website:
1. Access Portia Container:
Enter the running container:

docker exec -it [container_id] /bin/bash

2. Start Scraping with Portia:


Run the portiacrawl command with nohup and & to keep it running in
the background:

nohup portiacrawl /app/data/projects/orient orient-news.net -o


/app/data/orient_output.csv -t csv &

Transferring the Output File:


1. Copy File from Docker to Server:
First, copy the output file from the Docker container to the server:

docker cp [container_id]:/app/data/orient_output.csv
/path/on/server

2. Transfer File from Server to Local Machine:


Use scp to transfer the file from the server to your local machine:

scp username@server_ip:/path/on/server/orient_output.csv
/local/directory

Monitoring File Growth:


1. Check File Size:
Inside the container, use ls -lh to check the file size:
ls -lh /app/data/orient_output.csv

2. Watch File Growth:


Use watch to monitor the file size in real-time:

watch -n 10 'ls -lh /app/data/orient_output.csv'

By following these steps, you can scrape websites using Portia on a Linode server,
ensure your data is saved even if the container stops, transfer files from the Docker
volume to your local machine, and monitor the progress of your scraping task.

You might also like