A Learning Portal from Recruitment India
________ is the most popular high-level Java API in Hadoop Ecosystem
HCatalog
Cascalog
Scalding
Cascading
Answer with explanation
Answer: Option DExplanation
Explanation:
Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.
Workspace
HDFS files are designed for
Writing into a file only once.
Low latency data access.
Multiple writers and modifications at arbitrary offsets.
Only append at the end of file
Answer with explanation
Answer: Option DExplanation
Only append at the end of file
Workspace
During the execution of a streaming job, the names of the _______ parameters are transformed.
vmap
mapvim
mapreduce
mapred
Answer with explanation
Answer: Option DExplanation
To get the values in a streaming job’s mapper/reducer use the parameter names with the underscores.
Workspace
What does commodity Hardware in Hadoop world mean?
Industry standard hardware
Low specifications Industry grade hardware
Discarded hardware
Very cheap hardware
Answer with explanation
Answer: Option BExplanation
Low specifications Industry grade hardware
Workspace
Gzip (short for GNU zip) generates compressed files that have a _________ extension.
.g
.gzp
.gzip
.gz
Answer with explanation
Answer: Option DExplanation
You can use the gunzip command to decompress files that were created by a number of compression utilities, including Gzip.
Workspace
The ________ option allows you to copy jars locally to the current working directory of tasks and automatically unjar the files.
files
task
archives
None of the mentioned
Answer with explanation
Answer: Option CExplanation
Archives options is also a generic option.
Workspace
_________ is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.
Flow Scheduler
Data Scheduler
Capacity Scheduler
None of the mentioned
Answer with explanation
Answer: Option CExplanation
The Capacity Scheduler supports for multiple queues, where a job is submitted to a queue.
Workspace
Which of the following must be set true enable diskbalnecr in hdfs-site.xml
dfs.balancer.enabled
dfs.disk.balancer.disabled
dfs.disk.balancer.enabled
dfs.diskbalancer.enabled
Answer with explanation
Answer: Option CExplanation
dfs.disk.balancer.enabled
Workspace
Which of the following genres does Hadoop produce?
Relational Database Management System
Distributed file system
JAX-RS
Java Message Service
Answer with explanation
Answer: Option BExplanation
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user.
Workspace
The compression offset map grows to ____ GB per terabyte compressed.
1-3
10-16
20-22
0-1
Answer with explanation
Answer: Option AExplanation
The more you compress data, the greater number of compressed blocks you have and the larger the compression offset table.
Workspace
For ___________ partitioning jobs, simply specifying a custom directory is not good enough.
static
semi cluster
dynamic
All of the mentioned
Answer with explanation
Answer: Option CExplanation
Since it writes to multiple destinations, and thus, instead of a directory specification, it requires a pattern specification.
Workspace
________ permits data written by one system to be efficiently sorted by another system.
Complex Data type
Order
Sort Order
All of the mentioned
Answer with explanation
Answer: Option CExplanation
Avro binary-encoded data can be efficiently ordered without deserializing it to objects.
Workspace
The need for data replication can arise in various scenarios like :
Replication Factor is changed
DataNode goes down
Data Blocks get corrupted
All of the mentioned
Answer with explanation
Answer: Option DExplanation
Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.
Workspace
Point out the wrong statement :
There are four namespaces for variables in Hive
Custom variables can be created in a separate namespace with the define
Custom variables can also be created in a separate namespace with hivevar
None of the mentioned
Answer with explanation
Answer: Option AExplanation
Three namespaces for variables are hiveconf, system, and env.
Workspace
A ________ is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs
trigger
view
schema
None of the mentioned
Answer with explanation
Answer: Option BExplanation
A view is an application that is deployed into the Ambari container.
Workspace
_____________ is an open source system for expressive, declarative, fast, and efficient data analysis.
Flume
Flink
Flex
ESME
Answer with explanation
Answer: Option BExplanation
Stratosphere combines the scalability and programming flexibility of distributed MapReduce-like platforms with the efficiency, out-of-core execution.
Workspace
What was Hadoop written in?
Java (software platform)
Perl
Java (programming language)
Lua (programming language)
Answer with explanation
Answer: Option CExplanation
The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command-line utilities written as shell-scripts.
Workspace
InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.
puts
gets
getSplits
All of the mentioned
Answer with explanation
Answer: Option CExplanation
InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.
Workspace
Applications can use the ____________ to report progress and set application-level status messages
Partitioner
OutputSplit
Reporter
All of the mentioned
Answer with explanation
Answer: Option CExplanation
Reporter is also used to update Counters, or just indicate that they are alive.
Workspace
Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.
JobTracker
DataNode
JobTracker
TaskTracker
Answer with explanation
Answer: Option BExplanation
JobTracker
Workspace
__________ are highly resilient and eliminate the single-point-of-failure risk with traditional Hadoop deployments
EMR
Isilon solutions
AWS
None of the mentioned
Answer with explanation
Answer: Option BExplanation
enterprise data protection and security options including file system auditing and data-at-rest encryption to address compliance requirements is also provided by Isilon solution.
Workspace
The output of the _______ is not sorted in the Mapreduce framework for Hadoop
Mapper
Cascader
Scalding
None of the mentioned
Answer with explanation
Answer: Option DExplanation
The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.
Workspace