StellaAthena
StellaAthena t1_jdydjg4 wrote
Reply to comment by OkWrongdoer4091 in [D] ICML 2023 Reviewer-Author Discussion by zy415
I have four papers. Two have no comments, one has all three reviewers say “thanks but I’ll keep my score” with no further elaboration. The 7/7/2 paper had the 2 and one of the 7s argue and the third reviewer remained silent. All tolled, 5/12 responded.
StellaAthena t1_jdotklz wrote
Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
It’s somewhat worse and a little faster.
StellaAthena t1_jdotc87 wrote
Reply to comment by Ph0masta in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai
It’s it’s own block not connected to anything
StellaAthena t1_jdi094w wrote
Reply to comment by ILOVETOCONBANDITS in [D] ICML 2023 Reviewer-Author Discussion by zy415
I just posted in response to each reviewer:
> Thank you for taking the time to review our work. We have carefully considered your comments and have provided a thorough rebuttal addressing your concerns. If you feel that your comments have been adequately addressed, we would greatly appreciate it if you could update your score to reflect that. We are also more than happy to continue this conversation over the next few days until the March 26th deadline.
I submitted several papers, all of which got borderline scores (average between 4.3 and 5.3), though one got 7 / 7 / 2 (yikes!). I had been hopeful that a strong rebuttal could judge one of them over the line, but the longer it goes without any response or updates the more discouraged I get.
StellaAthena t1_jd9emj8 wrote
Reply to comment by Astaligorn in [D] ICML 2023 Reviewer-Author Discussion by zy415
“We are glad that you view our work as impactful enough to warrant extension to other domains”
StellaAthena OP t1_jaom9of wrote
Reply to comment by starlistener in [N] EleutherAI has formed a non-profit by StellaAthena
Definitely! Come check out our discord server and introduce yourself.
StellaAthena OP t1_jao8e46 wrote
Reply to comment by keepthepace in [N] EleutherAI has formed a non-profit by StellaAthena
No it does not. In the past we felt that the best way to achieve our goals was to focus almost exclusively on training large models though, and we no longer feel that’s the case.
Submitted by StellaAthena t3_11g4a9p in MachineLearning
StellaAthena t1_is7iss2 wrote
The proof is even more simple: (xW_q)(xW_k)^T = x(W_qW_k^T )x^T = xWx
The problem is that W_q and W_k are not square matrices. They are d_model by d_head, and so their product is d_model x d_model. In practice d_model >> d_head (e.g., they’re 4096 and 256 respectively in GPT-J). Doing it your way uses a lot more memory and compute
StellaAthena t1_je3tz04 wrote
Reply to comment by regalalgorithm in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
I found this analysis incredibly unconvincing. They used a weaker standard for deduplication than is standard as well as a weaker analysis than the one they did for the GPT-3 paper.