Topic Modelling, Website classifying (200$)

Closed Posted 4 years ago Paid on delivery
Closed Paid on delivery

I'm struggling to classify random websites properly.

In fact I have posted almost the same project previously. you can read/view in the below.

[login to view URL]

my purpose is the same as the previous project.

and explaining it deeper, when the tool gets to be made. it'd be applying on thousands / millions of websites.

After posting previous project, I happened to know that one of the general method is developing NLPㅡ model to classify certain websites.

and for that, we need to have manual classified/processed data about the target sort website.

though, in my thinking that's more like picking website category one by one.

What I've really wanted is, something 'automatically-classifying from the judging/defining what kind of website it is.'

So, something likewise this. automatically calculating similarity index between certain websites. (probably by NLP tactics)

and then, cut each of part between websites following the similarity index number.

If certain websites get to be judged similarily each other, then we would be able to bind them all automatically into one category.

without human judgement putting manually processed examples.

So this is what the project should be eventually like.

If you really really wonder the very original purpose of this project, is to have a fresh view of what kind of websites could be existing.

And about this 'website classification' I really do wonder if there's some work have been done/completed before. I'm sure there would be one.

You see, when you look at e-commerce products, there are always categories what kind of product it is. If it's clothes or shoes, computer, USB, or furniture.

I really wonder if there's some website that have pre-judged and pre-classified such categories for 'websites'.

so perhaps we could see rough categories of websites -

e-commerce

community website

porn website

....

And in this project, there are some of specifics you should consider. Please read below.

Condition 1.

Should be able to operate on global scale. When you search around websites, you can expect it's mostly anglosphere websites written in English, But the project purpose is to even classify websites from another market and another country. For example) website that is written in Russian, Hindi, Chinese.

This is why manual data input could be meaningless and only similarity index measure to acquire website category would be the way.

condition 2.

please show me how it does work by picking 10 times of examples.

condition 3.

after you showing me condition 2, I can cross check bringing images from my backgrounds.

condition 4.

when cross checking in condition 3 is done, I will release the milestone.

condition 5.

when the main script gets to be finished, it would be needing to implement multi-threaded scripting environment to compensate its speed. (the tool should be applying into thousands and millions of websites, so speed itself is important matter)

condition 6.

tool should have similarity index variable inside of the script. so i can adjust how narrow/wide the similarity degree will be.

Essential Note 1.

If you know some service/website that is able to satisfy project purpose, and a service can provide their API and let clients use their service in script/command line, I'm also opened to use such service. You would need to help to use the script. (But when it gets to be 3rd party software/service API using case, Since it is not property made by you, and since it'll cost regularily paying to that 3rd party service, and the offer price would be much lower than 200$. I would release 60$ for setting up the script using API. Please remind that.)

Referrable keywords/links.

[login to view URL]

[login to view URL]

Before offering bid : Please explain briefly how the work would be done. Or perhaps, please explain what other procedures need to be done before going deep in the main work to get this job done together.

C Programming C++ Programming Data Scraping Natural Language Web Crawling

Project ID: #19717141

About the project

3 proposals Remote project Active 4 years ago

3 freelancers are bidding on average $157 for this job

engineeringexp

A Data Scientist with experience in Python, R programming, R Shiny, R studio and anything related to data science and python Master in Engineering, Electrical and Electronic Engineer, who is dynamic, reliable, resou More

$30 USD in 3 days
(4 Reviews)
2.9