spark scala nlp specialities to optimize my code


I wrote a apache spark scala program to find tf-idf using corpus, It's hanging on at point near group by statement. I want someone can fix that issue.

I have list of articles stored in s3 as parquet, so first I'm reading it as dataframe and creating n-grams and keeping it in one hand.

On other hand, s3 has 10k posts (corpus) as parquet. I'm reading it as dataframe and keeping it.

So now I want to find document frequency for each term (n-gram) against corpus

[Removed by Admin]

I have written, I'm willing to share it to the right candidate

Word2Vec knowledge is plus

Skills: Natural Language, Scala, Spark

See more: spark plugin source code, optimize code withselection vba, optimize code igniter, spark sql code generation, optimizing spark sql, spark catalyst vs tungsten, spark custom strategy, spark sql internals, spark catalyst examples, spark query planner, spark catalyst architecture, fix bugs optimize code, nlp text cluster code, optimize code, inverse document frequency java code, optimize code image processing, frequency term document java code, python code document frequency, spark scala, source code for projects on jsp

About the Employer:
( 3 reviews ) Bangalore, India

Project ID: #18112853

2 freelancers are bidding on average $5/hour for this job


Hi, Hope you doing great. Rigeldata Solutions offers custom application development and maintenance services using Open Source Software (OSS) solutions. We have strong practice in Apache Spark, Apache NIFI, Kylo, More

$5 USD / hour
(0 Reviews)
$5 USD / hour
(0 Reviews)