Dev notes: Working with Cassandra table from Apache Pig

пятница, 23 октября 2015 г.

Working with Cassandra table from Apache Pig

Apache Pig is another great tool for analyzing big data along with Hive. There are a lot of useful scripts and reports were build in Apache Pig. Nowadays Apache Spark is going to be standard for Big Data processing.

So here is my short 'step-by-step' guide to connect Apache Pig with your data stored in Apache Cassandra.

Set up environment

First we need to set up Cassandra cluster server address and port for Pig. It can be done through environment variables or later in pig script, trough the connection string parameters. For Unix machines setting up environment variables will look like the following

Register jars

Now we can run grunt. We need to plug some jars to Pig regarding to Cassandra. You can do it with the following command

Fetch datasource

Now everything is ready to fetch data from your Cassandra table. Let`s assume that you have keyspace with the name 'mykeyspace' and table with the name 'mytable'. Th following snippet will fetch the whole table from Cassandra

Here is full specification for Cassandra connection string

cql://[username:password@]<keyspace>/<columnfamily>[?[page_size=<size>][&columns=<col1,col2>][&output_query=<prepared_statement>][&where_clause=<clause>][&split_size=<size>][&partitioner=<partitioner>][&use_secondary=true|false][&init_address=<host>][&rpc_port=<port>]]

As you can see, we can fetch not only the whole table, but only particular view expressed as select statement, or just some columns.
Also we can set up Cassandra cluster host and port in connection string not in environment path variables.

3 комментария:

Tejuteju22 августа 2018 г. в 03:28
very informative blog and useful article thank you for sharing with us , keep posting Big Data Hadoop Online Course Hyderabad
ОтветитьУдалить
Ответы
Elliana Taylor16 августа 2020 г. в 07:30
Awesome article, it was exceptionally helpful! I simply began in this and I'm becoming more acquainted with it better! Cheers, keep doing awesome!
Software Testing Services
Independent Software Testing Services
Functional Testing Services
QA Automation Testing Services
eCommerce Testing Services
Performance Testing Services
Security Testing Services
API Testing Company
Regression Testing Services
Mobile App Testing Services
ОтветитьУдалить
Ответы
Clove HR23 октября 2025 г. в 04:49
Great article! I really appreciated the clear points on topic. Thanks for sharing this useful information with us.
Human Resource Management System
Payroll Management Software
Employee Management Software
Leave Management Software
ОтветитьУдалить
Ответы

Добавить комментарий

пятница, 23 октября 2015 г.