California State University Cleaning and Profiling Code Worksheet

Description

Don't use plagiarized sources. Get Your Custom Essay on
California State University Cleaning and Profiling Code Worksheet
Just from $13/Page
Order Essay

Cleaning and Profiling Code

Use only Hadoop MapReduce in this part of your project.

Do not use anything else.

You must write and submit 2 separate MapReduce jobs:


MR Job 1.

Data profiling – to explore your data

– Name the files: CountRecs.java, CountRecsMapper.java, CountRecsReducer.java

(Please use these exact names for your classes)

– This MR job counts the number of records in a dataset

– Run it on the original dataset, before cleaning, and output the number of records

– Run it on the cleaned dataset (result of MR Job 2 described below), output number of records – If the number of records don’t match, you should figure out why that is

– Re-submit a schema if it has changed.

MR Job 2.

Data cleaning – to avoid nasty exceptions later on in your analytic

– Name the files: Clean.java, CleanMapper.java, CleanReducer.java

(Please use these exact names for your classes)

– This MR job cleans the data – for example, by dropping columns you don’t need.

– It should write out a new file with only the columns you will use in your analytic.

– The selected columns for your data schema

For full credit, provide the classes for each job

Order your essay today and save 20% with the discount code: ESSAYHELP

Order a unique copy of this paper

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
Top Academic Writing Service Ready to Help
with Your Essays, Assignments, and Research

Order your essay today and save 20% with the discount code ESSAYHELP