Submitted by CuriousCesarr t3_109tysl in MachineLearning
[removed]
Submitted by CuriousCesarr t3_109tysl in MachineLearning
[removed]
A fair question. What information would you like me to provide?
This isn't an idea guy post. But to pitch something for funding, you need a rough estimate of time and costs and some deliverable milestones.
Sure but no one can give you any of those things with the back of the napkin idea you've provided. You're a very long way from being able to pitch for funding.
Well, the thing is that my friend doesn't pitch ideas to people with money as a job. He's just friends with them and they go out for coffee/ dinner, sometimes they make a deal, etc.. So a "formal approach" for VC funding doesn't apply here.
Truly, a back-of-the-napkin idea won't catch anyone's eyes, that's why I'm searching for someone that can give some feasible milestones/ a timeframe and budgets for them and he will present that.
1). Not really. Data would have to be processed, but probably it can be introduced via a well structured form:
2). A great question! Zillow seems like a great example. As accurate as possible I guess. But a good starting point would be a price range.
Ah ok. On the first point I guess whoever you are looking for will need to spend a considerable amount of time building/finding a dataset to train a model.
On the second point, I might have incorrectly assumed you were familiar with the Zillow controversy around price prediction.
The TL;DR is that the ML team used a model to forecast prices using a tool made by Facebook called Prophet. The model was probably accurate enough for displaying a rough prediction on a website. Another team in Zillow started using these price predictions to flip houses and lost a whole bunch of money since the model was not designed to do this.
A lot of armchair data scientists quickly pointed the finger at Prophet for being a "bad" model. The reality is all models are bad if they are used for the wrong reason. In this case, the team flipping houses likely didn't listen to the data science team when they said the model shouldn't be used for that purpose.
This is why it's a good idea to know how the model outputs are going to be used. The obvious answer is always "as accurate as possible" but sometimes that might not be accurate enough...
Hope this helps!
No, I'm European so I have no idea about the Zillow debacle sadly.
The outputs would probably be used as a price evaluator for the living space (my friend works as a registrar of new/ bought homes). Honestly I think the Zillow usecase might be desired ultimately.
Would you be interested? :)
I’m pretty sure that I will have students who are willing to do this . But just how do you plan on getting the data ? Or is it something the guy you hire would have to sort out .
Also not to be pessimistic but I absolutely do not think it would be possible to make a deep learning model that predicts how many square meters a property is based on some pictures alone . This is a mammoth task .
Well, you'd get the pics AND the info on the livable area (total livable area, nr of rooms, a sketch of the place, livable area for each room, etc.).
Oh okay I do apologise I thought it was just through pictures. Still a hard task . I do have some students in my phd cohort who would be willing to work on this . Lmk if you are interested and I can set up a meeting .
Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)
So as far as I understand the project, you want to estimate the price of real estate. There're a few ways to do this. Forget pictures for the moment, just go with listed/numeric information.
You have information like Area/Square Footage, Listed Amenities, Age, Location, etc.If you have existing data of this sort, where it lists all the above and then a price, then it is fairly straightforward to pull off – but no guarantees on the accuracy. This has been done by plenty of people, so if you just do this your investors will probably ask you about how you're going to compete with established Real Estate companies who have much bigger teams and much more data.
Now let's consider images: you have pictures of the house, and you want to use those pictures as a way to measure how broken-down/upscale the house is and use that as a parameter to base the price of. You are going to combine this with the above, of course, because it's ridiculous otherwise. I'll say this frankly – this hasn't really been done, and it's a research problem. Not a 'product problem'. You could do a whole PhD thesis on this alone. There are so many different ways to approach this.
Honestly I've given you the entire business plan you're looking for here lmao. Only reason I'm comfortable doing this is because what you're imagining is not really a feasible business plan except for at the very, very basic level.
Like, if you had a team that could pull any of these off, they would be working at AirBnB, Zillow or some other major real estate company already.
If those investors are feeling particularly generous and give you several years and an 7-figure budget, then this might be worth considering. Otherwise...
Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)
Could be done but would require about 6 months for data collection and experimentation. And I think the going rate for 6 month contracts for AI engineers in the UK is 30K, US is upwards of 60k.
Good luck bro
I'm not even convinced it's possible based on the requirements. You're not going to get structured data. Just pictures of the outside and inside of the house I assume. How are you going to reliably estimate livable space, current state, or even number of rooms when not even all rooms might be properly pictured. You're banking on extracting these features from what I assume to be suboptimal images with high accuracy (very doubtful tbh) and then estimating price based on the features, which is useless if the features aren't extracted properly from the images.
Even if this was possible with high enough accuracy, the dataset you would need for this has be absolutely huge. I really don't believe someone can gather enough in 6 months while simultaneously developing the nn.
And then we're not even talking about the legality of scraping competitors websites to compare them to.
I'm not convinced I could do this in 6 months and I wouldn't do it for that price.
I dunno if my English is that bad or people are in a rush when reading my post but: you get the images AND information about the residence itself (nr of rooms, total living space, space of each room, a sketch of the place, etc.).
Ah my bad. I think you could make it a bit more clear in your post but it's definitely on me for misunderstanding. If the information about the residence was given in the document itself then it becomes a lot more doable.
I still see quite few problems such as neighbourhood, etc. influencing the price, which means you'd need an absolutely huge dataset with very detailed features. And even then I think the accuracy will still not be optimal. Then there's still the issue with scraping competitors data from their website, which I doubt is legal.
It really depends on what this will be used for. Want to use this to recommend houses to potential buyers in a certain price range? Absolutely doable, but it seems completely overkill for an application like that. Want to use it to replace humans who's job it is to give price estimations? Probably not a good idea.
Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)
Wow! Thank you for the concise yet complex reply!
Would you be interested in such a project? Based on your final sentence, I assume not. :(
You need a dataset of a few thousands or a few millions examples of input (documents + other contextual info like location data) and outputs ( estimates, other attributes like number of bedrooms and stuff) in order to build such feature. Depending on the quality and amount of data that you have and the perfomance requirements that you have, this can go from a few months projetcs to nearly impossible to do. (note, if you have no data like you said or expect 0 error, then this is impossible to do)
As far as I can tell, a few 100s of homes would be available to be used as a dataset.
A few hundreds is way too little. I would be comfortable with a few thousand homes' data, and more comfortable yet if I could scrape Zillow or something on top of that.
(but that has its own issues, both legally and in terms of data drift, since Zillow data would be American while you're European).
[deleted]
Circumventing bot blocking protocols is a trivial matter.
The potential lawsuit, on the other hand, is not.
Sorry for the late reply but I had a very busy period. In the end, I found a small Greek ML company that was excited about the project and we entered deeper discussions. I also updated my post to reflect this. Have a great day! :)
That is likely a couple of orders of magnitude too little data.
The real question is : can you get accurate data...you need to scan pics for a gazillion different kind of things and in the end provide a number..but in order for this to work you need a ton of data to provide the correct object matches
Copy-pasting a comment of mine since it answers your question:
As far as I can tell, a few 100s of homes would be available to be used as a dataset.
Malignant-Koala t1_j40fflp wrote
This is an "idea guy" post, isn't it. ;)
Like, one of those "I have no idea how incredibly hard it would be to even gather the necessary data for this" ideas? A, "I need a quote to give to my guy in two days despite being unable to provide you with any more guidance than a vaguely worded pseudo-concept" proposal?
Best of luck dude.