alkibijad OP t1_j7f1b2e wrote on February 6, 2023 at 8:59 AM

Reply to comment by vade in [D] Apple's ane-transformers - experiences? by alkibijad

Looking forward to hearing their experiences!

alkibijad OP t1_j6wuvnc wrote on February 2, 2023 at 1:41 PM

Reply to comment by TheDeviousPanda in [D] Apple's ane-transformers - experiences? by alkibijad

Can you please elaborate your answers and quantify?
I'm most interested in the effort for bullets 2 and 3. In your own experience, did it take hours, days, weeks?

alkibijad OP t1_j6w7lo3 wrote on February 2, 2023 at 9:09 AM

Reply to comment by TheDeviousPanda in [D] Apple's ane-transformers - experiences? by alkibijad

That was not the answer I was hoping for, but very helpful :)
Do you have any code/repo to share? I'm only able to find the DistilBERT implementation in apple's repo, would like to see some other examples?

alkibijad OP t1_j462x0f wrote on January 13, 2023 at 12:40 PM

Reply to comment by suflaj in [D] Is there a distilled/smaller version of CLIP, or something similar? by alkibijad

I was hoping to just fine-tune the model, let the training last days at most. Seems like my best chance is to wait for distilled stable diffusion, and use their clip encoder, as u/LetterRip mentions.

alkibijad OP t1_j462o4r wrote on January 13, 2023 at 12:38 PM

Reply to comment by LetterRip in [D] Is there a distilled/smaller version of CLIP, or something similar? by alkibijad

Cool, I wasn't aware of the distilled diffusion! That could be useful, thanks for sharing!

alkibijad t1_j42c8g3 wrote on January 12, 2023 at 6:18 PM

Reply to [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC

I think it's going to be everywhere, but mostly Bing and Office products. Those are things where it can have an immediate impact.

alkibijad t1_j14x08t wrote on December 21, 2022 at 6:16 PM

Reply to comment by nuthinbutneuralnet in [D] Simple Questions Thread by AutoModerator

This may not be the direct answer, but it's applicable to many problems:

Use the simplest approach first. This would be creating a simple model, in this case flat fully connected layer.
Measure the results.
If the results aren't good enough, think about what could improve the results: different model architecture, training procedure, obtaining more data...
Iterate (go to 2)

Also:

`creating linear or embedding layers for each feature group before combining them together` - this adds additional knowledge into the network, so it may help... but in theory the network should be able to find this out on its own - the combinations that don't have much sense will have weights close to zero - that's why I advise you to start without it (and try doing it without it).

1K+ features: in some cases this is a lot of features, in some it's not that big number... but it maybe makes sense to reduce the number of features, by using some of the dimension reduction techniques.