0% found this document useful (0 votes)
12 views

Chatgpt

The document provides Pig scripts to perform four tasks on a movies dataset: 1) Declare a bag using the movies data from a CSV file 2) Filter movies with a rating greater than 3 3) Display the number of rows in the movies dataset 4) Group the movies by genre

Uploaded by

PRAMOTH KJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Chatgpt

The document provides Pig scripts to perform four tasks on a movies dataset: 1) Declare a bag using the movies data from a CSV file 2) Filter movies with a rating greater than 3 3) Display the number of rows in the movies dataset 4) Group the movies by genre

Uploaded by

PRAMOTH KJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

12/11/23, 11:32 PM ChatGPT

You:

a) Create pig script to declare bag using movies_data.csv b) Create pig script to Filter movie rating > 3 c) Create pig script to Display pig script to display
number of rows d) Create pig script to group movies by genre.

------------------

ChatGPT:

Sure, I can help you with that! Below are the Pig scripts for each of the tasks you mentioned:

Assuming you have a CSV file named movies_data.csv with columns like movie_id, movie_name, genre, and rating.

a) Create Pig script to declare a bag using movies_data.csv:

pig
-- Load the data from CSV file
movies = LOAD 'movies_data.csv' USING PigStorage(',') AS (movie_id:int, movie_name:chararray, genre:chararray, rating:float);

-- Create a bag using the entire dataset


movie_bag = GROUP movies ALL;

-- Display the bag


DUMP movie_bag;

b) Create Pig script to filter movie rating > 3:

pig
-- Load the data from CSV file
movies = LOAD 'movies_data.csv' USING PigStorage(',') AS (movie_id:int, movie_name:chararray, genre:chararray, rating:float);

-- Filter movies with a rating greater than 3


high_rated_movies = FILTER movies BY rating > 3;

https://chat.openai.com/c/a4d305b4-3075-49dc-815c-15286d6d8ff2 1/2
12/11/23, 11:32 PM ChatGPT

-- Display the filtered results


DUMP high_rated_movies;

c) Create Pig script to display the number of rows:

pig
-- Load the data from CSV file
movies = LOAD 'movies_data.csv' USING PigStorage(',') AS (movie_id:int, movie_name:chararray, genre:chararray, rating:float);

-- Count the number of rows


num_rows = FOREACH (GROUP movies ALL) GENERATE COUNT(movies);

-- Display the result


DUMP num_rows;

d) Create Pig script to group movies by genre:

pig
-- Load the data from CSV file
movies = LOAD 'movies_data.csv' USING PigStorage(',') AS (movie_id:int, movie_name:chararray, genre:chararray, rating:float);

-- Group movies by genre


movies_by_genre = GROUP movies BY genre;

-- Display the result


DUMP movies_by_genre;

Note: Make sure to adjust the column names and data types in the Pig scripts based on your actual CSV file structure.

------------------

https://chat.openai.com/c/a4d305b4-3075-49dc-815c-15286d6d8ff2 2/2

You might also like