StellaAthena t1_je3tz04 wrote on March 29, 2023 at 5:28 AM

Reply to comment by regalalgorithm in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

I found this analysis incredibly unconvincing. They used a weaker standard for deduplication than is standard as well as a weaker analysis than the one they did for the GPT-3 paper.

StellaAthena t1_jdydjg4 wrote on March 28, 2023 at 2:17 AM

Reply to comment by OkWrongdoer4091 in [D] ICML 2023 Reviewer-Author Discussion by zy415

I have four papers. Two have no comments, one has all three reviewers say “thanks but I’ll keep my score” with no further elaboration. The 7/7/2 paper had the 2 and one of the 7s argue and the third reviewer remained silent. All tolled, 5/12 responded.

StellaAthena t1_jdotklz wrote on March 26, 2023 at 12:42 AM

Reply to comment by Puzzleheaded_Acadia1 in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

It’s somewhat worse and a little faster.

StellaAthena t1_jdotc87 wrote on March 26, 2023 at 12:40 AM

Reply to comment by Ph0masta in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

It’s it’s own block not connected to anything

StellaAthena t1_jdi094w wrote on March 24, 2023 at 3:13 PM

Reply to comment by ILOVETOCONBANDITS in [D] ICML 2023 Reviewer-Author Discussion by zy415

I just posted in response to each reviewer:

> Thank you for taking the time to review our work. We have carefully considered your comments and have provided a thorough rebuttal addressing your concerns. If you feel that your comments have been adequately addressed, we would greatly appreciate it if you could update your score to reflect that. We are also more than happy to continue this conversation over the next few days until the March 26th deadline.

I submitted several papers, all of which got borderline scores (average between 4.3 and 5.3), though one got 7 / 7 / 2 (yikes!). I had been hopeful that a strong rebuttal could judge one of them over the line, but the longer it goes without any response or updates the more discouraged I get.

StellaAthena t1_jd9emj8 wrote on March 22, 2023 at 7:47 PM

Reply to comment by Astaligorn in [D] ICML 2023 Reviewer-Author Discussion by zy415

“We are glad that you view our work as impactful enough to warrant extension to other domains”

StellaAthena OP t1_jaom9of wrote on March 2, 2023 at 10:52 PM

Reply to comment by starlistener in [N] EleutherAI has formed a non-profit by StellaAthena

Definitely! Come check out our discord server and introduce yourself.

StellaAthena OP t1_jao8e46 wrote on March 2, 2023 at 9:20 PM

Reply to comment by keepthepace in [N] EleutherAI has formed a non-profit by StellaAthena

No it does not. In the past we felt that the best way to achieve our goals was to focus almost exclusively on training large models though, and we no longer feel that’s the case.

StellaAthena t1_is7iss2 wrote on October 13, 2022 at 9:28 PM

Reply to [R]Wq can be omited in single head attention by wangyi_fudan

The proof is even more simple: (xW_q)(xW_k)^T = x(W_qW_k^T )x^T = xWx

The problem is that W_q and W_k are not square matrices. They are d_model by d_head, and so their product is d_model x d_model. In practice d_model >> d_head (e.g., they’re 4096 and 256 respectively in GPT-J). Doing it your way uses a lot more memory and compute