CSUB MapReduce Programming

Description

Don't use plagiarized sources. Get Your Custom Essay on
CSUB MapReduce Programming
Just from $13/Page
Order Essay

1. Write a MapReduce program that outputs the number of times each neighborhood appears in the Kaggle AirBNB dataset. You can download the dataset from here: https://www.kaggle.com/dgomonov/new-york-city-airb… You can see the schema (columns) of the dataset at the link above, too.

The file is a CSV (comma-separated values) dataset; a comma separates the fields in the dataset.

Use the WordCount approach to output from the Reduce stage to count the number of rentals in each neighbourhood (use the neighborhood field) and also output the neighborhood group (e.g. Brooklyn) using the neighbourhood_group field. For each neighborhood encountered, your output should look like this (this is only an example):

Brooklyn Kensington 25

Brooklyn Clinton Hill 5

Manhattan Midtown 45

To receive full credit, please hand in all of the following items:

-All code (please attach this homework zipped into one file).

-Your screenshots (-cat AND running the job)

2. Write a MapReduce program that further analyzes the same Kaggle AirBNB dataset uses in part 1 of this homework. Write WordCount approach MapReduce programs as indicated below:

2A Write a WordCount program to count the number of lines in the file. Name the program: CountLines. The Reducer should output:

Total number of lines in AirBNB file: [number]

2B Write a MapReduce WordCount program to count all lines that are shorter than the ideal number of fields. Name the program: CountBadShortRecords

The Reducer should output:

Total number of short lines in AirBNB file: [number]

2C Write a WordCount program to count all lines that are longer than the ideal number of fields. Name the program: CountBadLongRecords

The Reducer should output:

Total number of long lines in AirBNB file: [number]

2D Write a MapReduce WordCount program to count all lines that contain the ideal number of fields. Name the program: CountGoodRecords

The Reducer should output:

Total number of good lines in AirBNB file: [number]

To receive full credit, please hand in all of the following items:

A. All code (please attach this homework zipped into one file).
B. -cat each of the four output filesat the hlog command prompt, screenshot of the job running, for 4 results


Order your essay today and save 20% with the discount code: ESSAYHELP

Order a unique copy of this paper

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
Top Academic Writing Service Ready to Help
with Your Essays, Assignments, and Research

Order your essay today and save 20% with the discount code ESSAYHELP