GPT-3 links

and some renewed interest in Transformers

Gwern https://www.gwern.net/newsletter/2020/05#gpt-3

HN comments where iI found the link https://news.ycombinator.com/item?id=23623845

some intuition on relation between Graph and Transformer architecture https://graphdeeplearning.github.io/post/transformers-are-gnns/

I should find a good intro to transformer, it seems that they scale the right way, even bigger models with ever better performances, like size drives performance.

Viagra for growth

il Sildenafil è il Viagra, il Taldafil è il Cialis, il Vardenafil è il #meToo col brand meno forte, non ricordo quale

Ash Jogalekar@curiouswavefn chiede “Show me a molecular modeling or machine learning technique that predicts tadalafil (Cialis) from sildenafil (Viagra) without the former being explicitly represented in the training set, purely based on target information. That’s the kind of scaffold hop that will turn heads. “

la biochimica non è tutta così prevedibile, però la performance di Alphafold nel torneo di piegatura delle proteine ha lasciato una grande impressione

Già da un pò si pensa che iniettando un po di tech (leggi software e ML) nella ricerca pharma si può girare la #eroomLaw in #MooreLaw se non ci credete passate da A16z che ha cominciato a usare il termine Bio come buzzword analoga a Tech, tech non è più una abbrevizione di tecnologia, come Bio vuole essere una etichetta breve e riconoscibile di una tesi di investimento nel bioscience https://a16z.com/bio/

Solo che l’eccitazione del successo di Alphafold adesso fa pensare a qualcuno che anche la Great Stagnation possa essere giunta al termine, insomma Alphafold è viagra per gli economisti

Se lo chiede Tyler Cowen che il termine l’ha coniato nel suolibro del 2011. E un po di tech optimism si vede qua e là

2 letture per il giorno di festa, distopia e golpe

Noah Smith propone una distopia, plausibile ma secondo lui non la più probabile. Al centro sempre la Cina con il mix di autoritarianismo e tecnologia, il partito che si arma di intelligenza artificiale per controllare ogni piccolo evento della vita dei cinesi, allenando tutto sulla oppressione degli Uiguri https://noahpinion.substack.com/p/the-super-scary-theory-of-the-21st

Zeynep Tufekci sociologa USA di origine turca, lamenta che laddove in Turchia ci sono diversi termini per descrivere ogni tipo di golpe, in USA ne hanno uno solo e non riescono a cogliere in pieno il rischio del ridicolo comportamento di Trump dopo le elezioni, che giunge dopo anni di occupazione e imbastardimento della politica americana, gerrymandering etc.

Anche Napoleone III era un pagliaccio che fu portato al potere proprio per essere una macchietta, salvo poi prenderselo e farsi imperatore , per lui Marx disse “Hegel says somewhere that that great historic facts and personages recur twice. He forgot to add: “Once as tragedy, and again as farce.” https://www.theatlantic.com/ideas/archive/2020/12/trumps-farcical-inept-and-deadly-serious-coup-attempt/617309/

of Biochemistry and Geometry

some Nobel prize was awarded 50 years ago to someone who claimed a protein’s shape could be derived from the atoms building it, so it started a rece at guessing how a protein would fold, folding proteins in short. It came to artificial intelligence news this week, Alpha Fold of Deepmin won some protein-folding olympics https://www.nature.com/articles/d41586-020-03348-4

Twitter reminds me how important and difficult to deal with, is atom’s position in a molecule, detrmining properties https://twitter.com/curiouswavefn/status/1334562488495423489

while reading PIHKAL di Shulgin you can find a chapter named the 4-posiyion where it shows how halucinogenic potency derives from molecules taking the 4- position in a benzene circlae and stay there while in our body

Orthogonality, USA elections and AI

in my usa bubble on twitter, mostly founders and founder phylosophers, Robin Hanson is makeng rounds with his pos that advises To Oppose Polarization, Tug Sideways

Americans a re under shock after 2 boomers, worse, 2 silent generationers 1942 and 1946 babled at each other wthou much sense for 1 hour. Politics is a multidimensional tug-of-war and in certain cases it is better mind your own fight and tug sideways

orthogonality is a condition thay ensures that AIs need not obliterate us humans https://arbital.com/p/orthogonality/ found on the twitter of

fonder of the rationalisic community lesswrong and this nice arbital site with lots of tutorials

the purpose of adovcting otrhogonality is different

to Hanson when politics gets fsctituois better go orthogonal

AIs in Moravec dialogue with … think of him as too factitious and cannot bother more with the real 4-dim world, thay have gone their own multidimensional way

here we come at orthogonality is Y meaning, AIs neednot get confrontational with us, there’s plenty of orthonogal room 🙂

il giorno delle parole @@@morfiche

“I clarify: that was truth, not humor. The GPT setup is precisely isomorphic to training a (huge) neural net to compress online text to ever-smaller strings, then using the trained net to decompress random bytes.”

isomorphic: mapping between 2 structures that preserver the structure and can be reversed.

” as organisms become more and more complex through evolution, they need to model reality with increasing accuracy to stay fit. At all times, their representation of reality must be homomorphic with reality itself. Or in other words, the true structure of our world must be preserved when converted into your brain’s representation of it.”

from here https://savsidorov.substack.com/p/tldr-the-interface-theory-of-perception

Homomorphism in Algebra is a structure-preserivng map between 2 algebric structures of the same type (not reversible like isomorphic?)

Brain links 16 sep 2020

Joscha Bach on GPT-3 https://www.youtube.com/watch?v=FMfA6i60WDA GPT-3 moves in a semantic space, masters relations between words, but can only deepfake understading

Joshua built the MicroPSI architecture based on PSI Theory https://en.wikipedia.org/wiki/Psi-theory#MicroPsi_architecture

Curious Wavefunction makes a history of information and thermodynamics http://wavefunction.fieldofscience.com/2020/07/brains-computation-and-thermodynamics.html

ends it with the idea that our brain is a mixture of digital and analog processes, as posited by Von Neumann

Sav Sidorov another Joscha Bach video with highlights https://savsidorov.substack.com/p/tldr-joscha-bach-artificial-consciousness

“Some people think that a simulation can’t be conscious, and only a physical system can, but they got it completely backwards. A physical system cannot be conscious, only a simulation could be conscious.”

AI = General methods + power of the computers

like Communism was Eletrification + power of the soviet.

The bitter lesson o fRich Sutton http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Screenshot 2020-09-11 at 11.09.08

from Gwern May newsletter on scaling and metalearning:

“The scaling hypothesis regards the blessings of scale as the secret of AGI: intelligence is ‘just’ simple neural units & learning algorithms applied to diverse experiences at a (currently) unreachable scale.”

This is related somehow, distributed intelligence and fungi from The Curious Wavefunction, Life Distributed http://wavefunction.fieldofscience.com/2020/09/life-distributed.html

Scaling hypothesis in AI

start from Gwern here https://www.gwern.net/newsletter/2020/05#gpt-3

parameters scaling in GPT-3 does not run into linear scaling of performance nor dimishing returns. Rather it shows metalearning enhancing the performance

It was forecast by Moraves and since we are in a fat tail phenomenon this holds true: “the scaling hypothesis is so unpopular an idea, and difficult to prove in advance rather than as a fait accompli“. Before GPT-3 another epiphany on the scaling was the google cat moment which started the deep learning craze

Another idea which I like is that models like GPT-3 are definitely cheap and if they show superlinear growth it is a no brainer to go for bigger and more complex models, it is along way before matching the billions of expenses for Cern or nuclear fusion.

Carig Venter synthetic bacteria project cost us 40 milion, ground braking orojects costing so little should not be foregone

BTW to grasp the idea of how there could be a scaling benefit in growing deep learning sizes, go no further that a simple, unfounded but suggestive analogy with Metcalfe law of networks, network value grows with the square of nodes.