Mar 12, 2022 – Jan 03, 2024
Questions
- How do we design benchmarks resistant to shortcut learning and
actually evaluate how strong a model is at NLP for example?
- Should AI progress be made by chasing SOTA on benchmarks or
are there alternative routes?
- How can we measure “sentience” of AI? Parrots are nowhere as
good as GPT-3, yet we would intuit that they have more
consciousness than GPT-3 does.
- Why is that?
- Is this intuition correct?
- Has the AI Spring/Winter pattern cropped up in any
other fields? Is this a pattern unique to AI, or is there
some underlying cause for this over-optimism?
- Why is symmetry so prevalent in nature? Can symmetry
be harnessed for neural network design?
- Could you do policy iteration on a neural network?
- How do you adapt it to continuous?
To-read
Bibliography
- Chiang, T. (2023, February 9). ChatGPT Is a Blurry JPEG of the
Web. The New Yorker.
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
- Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J., Rytting,
C., & Wingate, D. (2022). Out of One, Many: Using Language
Models to Simulate Human Samples (arXiv:2209.06899). arXiv.
http://arxiv.org/abs/2209.06899
- Benton, G. W., Maddox, W. J., Lotfi, S., & Wilson, A. G.
(2021). Loss Surface Simplexes for Mode Connecting Volumes and
Fast Ensembling (arXiv:2102.13042). arXiv.
http://arxiv.org/abs/2102.13042
- Chan, S. C. Y., Santoro, A., Lampinen, A. K., Wang, J. X.,
Singh, A., Richemond, P. H., McClelland, J., & Hill, F.
(2022). Data Distributional Properties Drive Emergent In-Context
Learning in Transformers (arXiv:2205.05055). arXiv.
http://arxiv.org/abs/2205.05055
- Cong, Y., & Zhao, M. (2022). Big Learning: A Universal
Machine Learning Paradigm? (arXiv:2207.03899). arXiv.
http://arxiv.org/abs/2207.03899
- Delétang, G., Ruoss, A., Grau-Moya, J., Genewein, T.,
Wenliang, L. K., Catt, E., Hutter, M., Legg, S., & Ortega, P.
A. (2022). Neural Networks and the Chomsky Hierarchy
(arXiv:2207.02098). arXiv. http://arxiv.org/abs/2207.02098
- Dohan, D., Xu, W., Lewkowycz, A., Austin, J., Bieber, D.,
Lopes, R. G., Wu, Y., Michalewski, H., Saurous, R. A.,
Sohl-dickstein, J., Murphy, K., & Sutton, C. (2022). Language
Model Cascades (arXiv:2207.10342). arXiv.
http://arxiv.org/abs/2207.10342
- Ha, D., & Tang, Y. (2022). Collective Intelligence for
Deep Learning: A Survey of Recent Developments (arXiv:2111.14377).
arXiv. http://arxiv.org/abs/2111.14377
- Haluptzok, P., Bowers, M., & Kalai, A. T. (2022). Language
Models Can Teach Themselves to Program Better (arXiv:2207.14502).
arXiv. http://arxiv.org/abs/2207.14502
- Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai,
T., Rutherford, E., Casas, D. de L., Hendricks, L. A., Welbl, J.,
Clark, A., Hennigan, T., Noland, E., Millican, K., Driessche, G.
van den, Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen,
E., … Sifre, L. (2022). Training Compute-Optimal Large Language
Models (arXiv:2203.15556). arXiv.
http://arxiv.org/abs/2203.15556
- Jaderberg, M., Czarnecki, W. M., Osindero, S., Vinyals, O.,
Graves, A., Silver, D., & Kavukcuoglu, K. (2017). Decoupled
Neural Interfaces using Synthetic Gradients (arXiv:1608.05343).
arXiv. http://arxiv.org/abs/1608.05343
- Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., &
Stanley, K. O. (2022). Evolution through Large Models
(arXiv:2206.08896). arXiv. http://arxiv.org/abs/2206.08896
- Liu, Z., Kitouni, O., Nolte, N., Michaud, E. J., Tegmark, M.,
& Williams, M. (2022). Towards Understanding Grokking: An
Effective Theory of Representation Learning (arXiv:2205.10343).
arXiv. http://arxiv.org/abs/2205.10343
- McDermott, D. (1976). Artificial intelligence meets natural
stupidity. ACM SIGART Bulletin, 57, 4–9.
https://doi.org/10.1145/1045339.1045340
- Power, A., Burda, Y., Edwards, H., Babuschkin, I., &
Misra, V. (2022). Grokking: Generalization Beyond Overfitting on
Small Algorithmic Datasets (arXiv:2201.02177). arXiv.
http://arxiv.org/abs/2201.02177
- Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y.,
Bogacz, R., Christensen, A., Clopath, C., Costa, R. P., de Berker,
A., Ganguli, S., Gillon, C. J., Hafner, D., Kepecs, A.,
Kriegeskorte, N., Latham, P., Lindsay, G. W., Miller, K. D., Naud,
R., Pack, C. C., … Kording, K. P. (2019). A deep learning
framework for neuroscience. Nature Neuroscience, 22(11),
1761–1770. https://doi.org/10.1038/s41593-019-0520-2
- Sejnowski, T. (2022). Large Language Models and the Reverse
Turing Test (arXiv:2207.14382). arXiv.
http://arxiv.org/abs/2207.14382
- Tay, Y., Dehghani, M., Abnar, S., Chung, H. W., Fedus, W.,
Rao, J., Narang, S., Tran, V. Q., Yogatama, D., & Metzler, D.
(2022). Scaling Laws vs Model Architectures: How does Inductive
Bias Influence Scaling? (arXiv:2207.10551). arXiv.
http://arxiv.org/abs/2207.10551
- The Bitter Lesson. (n.d.). Retrieved September 30, 2021, from
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
- Vogelstein, J. T., Verstynen, T., Kording, K. P., Isik, L.,
Krakauer, J. W., Etienne-Cummings, R., Ogburn, E. L., Priebe, C.
E., Burns, R., Kutten, K., Knierim, J. J., Potash, J. B., Hartung,
T., Smirnova, L., Worley, P., Savonenko, A., Phillips, I., Miller,
M. I., Vidal, R., … Yang, W. (2022). Prospective Learning: Back to
the Future. ArXiv:2201.07372 [Cs].
http://arxiv.org/abs/2201.07372
- Zador, A. M. (2019). A critique of pure learning and what
artificial neural networks can learn from animal brains. Nature
Communications, 10(1), 3770.
https://doi.org/10.1038/s41467-019-11786-6
- Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D.
(2018). mixup: Beyond Empirical Risk Minimization
(arXiv:1710.09412). arXiv. http://arxiv.org/abs/1710.09412