tdgros
tdgros t1_jdvo1j7 wrote
Reply to comment by [deleted] in Is there a limit to the number of sounds you can hear simultaneously? by xXxjayceexXx
We don't go up to 96kHz because listeners can perceive it, but because it allows for the design of better filters, better processing, etc... which does result in better quality, but it'd be completely fine to then downsample in 48kHz right before sending to speakers.
tdgros t1_jdqjc8q wrote
Reply to comment by Co0k1eGal3xy in Is it possible to merge transformers? [D] by seraphaplaca2
there's also weight averaging in eSRGAN that I knew about, but that always irked me. The permutation argument from your third point is the usual reason I evoke on this subject, and the paper does show why it's not as simple as just blending weights! The same reasoning also shows why blending subsequent checkpoints isn't like blending random networks.
tdgros t1_jdqbgqy wrote
the model merging offered by some stable diffusion UIs do not merge the weights of a network! They merge the denoising results for a single diffusion step from 2 different denoisers, this is very different!
Merging the weights of two different models does not produce something functional in general, it also can only work for 2 models with exactly the same structure. It certainly does not "mix their functionality".
tdgros t1_jdqarbq wrote
Reply to comment by [deleted] in Is it possible to merge transformers? [D] by seraphaplaca2
what's the connection between LoRa and the question about merging weights here?
edit: weird, I saw a notification for an answer from you, but can't see the message...
LoRa is a compression method that replaces weight matrices with low rank approximations for single tasks. It does not merge models or weights
tdgros t1_jdlxy8a wrote
Reply to comment by shanereid1 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
There are versions for NLP (and a special one for vision transformers), here is the BERT one from some of the same authors (Frankle & Carbin) https://proceedings.neurips.cc/paper/2020/file/b6af2c9703f203a2794be03d443af2e3-Paper.pdf
It is still costly, as it includes rewinding and finding masks, we probably need to switch to dedicated sparse computations to fully benefit from it.
tdgros t1_jdbu6w7 wrote
Reply to comment by MisterManuscript in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
the SIFT patent expired in March, 2020. It's included in openCV now (it used to be in a "non free" extension of openCV)
tdgros t1_jczzeqz wrote
Reply to comment by phira in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
"what does a cow drink?" "Milk"
tdgros t1_jbybye9 wrote
Reply to comment by Egg_Stealer in [D] Simple Questions Thread by AutoModerator
what is love?
tdgros t1_jbt7dy0 wrote
Reply to comment by jobeta in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid
OP said UMAP above
tdgros t1_ja71ave wrote
Reply to comment by CellWithoutCulture in [D] Is RL dead/worth researching these days? by [deleted]
>toolformer
Are you sure there's RL in Toolformer? I thought it was mostly self-supervised and fine-tuned.
tdgros t1_j9bfds3 wrote
Reply to comment by harharveryfunny in [D] Something basic I don't understand about Nerfs by alik31239
Just read the post!
>However, the paper itself builds a network that gets as an input 5D vectors (3 location coordinates+2 camera angles) and outputs color and volume density for each such coordinate. I don't understand where do I get those 5D coordinates from? My training data surely doesn't have those - I only have a collection of images.
tdgros t1_j9b43pe wrote
Reply to comment by harharveryfunny in [D] Something basic I don't understand about Nerfs by alik31239
it's downvoted because it doesn't add anything to the conversation, OP has already stated that they know what info is input, they just don't know where to get it from. Someone already answered correctly at the top.
tdgros t1_j8snvyp wrote
Reply to comment by sessl in Perfectly spherical explosion spotted 150 million light-years away by Ok_Plum7895
the colored squares at the bottom right are DALL E's watermark
tdgros t1_j8j2wbd wrote
Reply to [R] Hitchhiker’s Guide to Super-Resolution: Introduction and Recent Advances by Maleficent_Stay_7737
Unless I missed it, the paper does mention the fact that the degradation mapping should be estimated but does not detail or cite papers that do that. (examples: KernelGAN, KernelNet, doubleDIP or MetaKernelGAN...)
tdgros t1_j8hn77o wrote
Reply to comment by gopher9 in [Discussion] The need for noise in stable diffusion by AdministrationOk2735
This one as well: https://openreview.net/pdf?id=QsVditUhXR
tdgros t1_j8d490l wrote
Reply to [D] Is a non-SOTA paper still good to publish if it has an interesting method that does have strong improvements over baselines (read text for more context)? Are there good examples of this kind of work being published? by orangelord234
Yes, you can see several NLP papers with ideas making models competitive to much larger ones, for instance.
tdgros t1_j8an84z wrote
Reply to [R] DIGIFACE-1M — synthetic dataset with one million images for face recognition by t0ns0fph0t0ns
they're the same picture
tdgros t1_j7vdocr wrote
With constrained optimization, you usually have a feasible set for the variables you optimize, but in a NN training you optimize millions of weights that aren't directly meaningful, so in general, it's not clear if you can define a feasible set for each of them.
tdgros t1_j649t5h wrote
Reply to comment by onFilm in Obsidian handaxe-making workshop from 1.2 million years ago discovered in Ethiopia by rmaccr
it was only 1.180.000 years old at the time, it looked much younger
tdgros t1_j5twgdm wrote
Reply to comment by Biovyn in [OC] Shakira's latest song reached 100M views in under 3 days. The crazy part? Only K-Pop and Asian bands had reached this milestone before. Here are the fastest videos to reach 100M views in YouTube history. by latinometrics
only if it's about Korea, India, Japan and Colombia/Argentina.
tdgros t1_j4vipol wrote
Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa
if the two cameras are rigidly fixed, then you can calibrate them like one calibrates a stereo pair, and at least align the orientation and intrinsics. The points very far from the camera will be well aligned, the ones very close will remain unaligned.
The calibration process will involve you pointing positions by hand, but the maths for the correction is very very simple after that.
tdgros t1_j4rlund wrote
Reply to comment by D20Jawbreaker in Bonobos, unlike humans, are more interested in the emotions of strangers than acquaintances by giuliomagnifico
Maybe you're thinking of Tay, that was closed after 16h of use only, in 2016
No of course not it did not turn fascist, but you could make it say stupid things easily, and people did just that.
It's a chatbot, not an actual being capable of thinking.
tdgros t1_j4qbwij wrote
Reply to comment by mfb- in If nuclear fission in U-235 causes the atom to be split into 2 smaller atoms (such as Kr-92 and Ba-141) then how is it that U-236 is produced as waste since the U-235 was just split into smaller peices? by Ian98766
Are the shape of the curves (clean, a few spikes, lots of spikes, super clean from ~0.002MeV onward) related to some physical processes we know? is it just due to the scale of the plot?
tdgros t1_j41fn3f wrote
Reply to comment by rehrev in [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont
At train time, you plug decoders at many levels with the same objective, you can find out if some things can be decoded earlier, using an additional network that outputs a sort of confidence. At inference time, you run the layers one by one, and stop when the confidence is high. which allows you to skip some computations. (It's probably a simplistic description, feel free to correct me)
tdgros t1_jdx0f2y wrote
Reply to comment by [deleted] in Is there a limit to the number of sounds you can hear simultaneously? by xXxjayceexXx
>Human vision is about 576 megapixels
it's really not, we don't even have that many vision cells per retina. You can find this figure if you extrapolate the density in the fovea to the entire field of view. But in reality, the density of color cells drops off sharply outside of the fovea, which only has a few degrees of FOV.
Do the test: focus your eyes on one word of text, without moving your eyes, how far can you read the words around? our vision is really really blurry outside the center, we just don't realize it.