I will share my use case and am looking to build out Advance analytics and BI platform thats cost-effective yet viable (capable of working successfully) and scalable with shortliste candidates.

Current Use Case is to collect data from CRM service providers like SalesForce &
Microsoft CRM 365 and then transform into meaningful data by logically joining
different entities received and then persisting it into Data Warehouse.
Business Intelligence Dashboards will be developed and integrate it with Data
Warehouse and perform analysis using analytical queries (joining, grouping, sorting
etc.

Along with this, the Advanced Analytics Platform will be developed in which Data Science
The team will perform basic analysis first and Then build & train their Machine Learning
Models on top of it for Predictive Analytics & Recommendation Engines.

Create below modules -

1. Data Sources Management
Using this Module, User will be able to configure their Data Sources from which Data
needs to be collected.
Once User configure the Data Sources, Data Ingestion Job will be submitted to Apache
Gobblin and Gobblin will start collecting data from Data Sources.

2. Real-Time Analytics
Once Ingested Data is available on Kafka Streams, Structured Spark Streaming will be
used to process & transform it in a distributed way and write that to MariaDB Column
Store.

3. Data Lake
Data Lake will be required as Lot of Data will be ingested and Data Warehouse will be
having cleaned & transformed version of data .
All Data collected from CRM Data Sources will be coming in JSON format . So Apache
Gobblin will convert the json data into parquet format before loading in Data Lake for
better I/O & Low Latency Reads

4. Data Processing
For Data Processing, Apache Spark will be used which is distributed data processing
engine and Data Processing Jobs will be scheduled using Apache Airflow and it will
read latest data from Data Lake and apply required transformations and then persist the
data to Data Warehouse.

5. Data Warehousing
For Data Warehouse, Hive on Minio will be used and File Format will be Parquet. The hive
will act as a MetaStore and Schemas will be defined in it for various tables and Tables
will be pointing to their corresponding Minio Storage Location.

Business Intelligence
Both Querying Engines i.e. Spark sql on Hive and MariaDb ColumnStore supports JDBC .
So , Any BI Tool can connect to them using Standard JDBC Connections and execute
analytics queries and create various charts/graphs.
About the recuiter
Member since Mar 14, 2020
Shamsaagazarzoo
from Bacs-Kiskun, Hungary

Skills & Expertise Required

Apache Hive Apache Kafka Apache Spark Backend Rest API Minio

Open for hiringApply before - Sep 9, 2024

Work from Anywhere

40 hrs / week

Fixed Type

Remote Job

$19,141.05

Cost

Offer to work on this project closes in 53 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Other jobs by this client

Implementation of Microsoft Dynamics 365 - Hourly
ZOHO CRM - Hourly
Dedicated Server Environment Configuration - Hourly
Photographer required for event in Glasgow - Fixed
Senior Backend PHP Developer Wanted - Hourly
Indesign PDF Forms Refresh (Style provided) - Hourly
Need a programmer who can code a math equation for me - Fixed
Create stearate micelle in water - Fixed
Setup Email on my Server - PostFix - Hourly
Very simple VOICE over project for US English Native Speakers ONLY - Fixed
Database /RDS optimization - Hourly
Diagnose,troubleshoot and fix integration of Tradingview Charting Library with own data - Hourly

Build out Advance Analytics and BI Platform

Skills & Expertise Required

Offer to work on this project closes in 53 days!
Are you interested in this Opportunity?

Other jobs by this client

Similar Projects

Train me on hadoop ecosystem.

Approximate synchronous Kafka copy

Small Task on Python & Spark (PySpark)

Converting JSON or Avro files to Parquet

Build out Advance Analytics and BI Platform

Skills & Expertise Required

Offer to work on this project closes in 53 days! Are you interested in this Opportunity?

Other jobs by this client

Similar Projects

Train me on hadoop ecosystem.

Approximate synchronous Kafka copy

Small Task on Python & Spark (PySpark)

Converting JSON or Avro files to Parquet

Offer to work on this project closes in 53 days!
Are you interested in this Opportunity?