References — Why AI Learning Tools Fail, and What Would Actually Work
Apr. 14, 2026
Cognitive load and working memory
- Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285.
- Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive Load Theory. Springer.
- Schroeder, N. L., & Cenkci, A. T. (2018). Spatial contiguity and spatial split-attention effects in multimedia learning environments: A meta-analysis. Educational Psychology Review, 30, 679–701.
- Ginns, P. (2005). Meta-analysis of the modality effect. Learning and Instruction, 15(4), 313–331.
Prior knowledge
- Hattie, J. (2009). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge.
- Ausubel, D. P. (1968). Educational Psychology: A Cognitive View. Holt, Rinehart & Winston.
- Simonsmeier, B. A., Flaig, M., Deiglmayr, A., Schalk, L., & Schneider, M. (2021). Domain-specific prior knowledge and learning: A meta-analysis. Educational Psychologist, 57(1), 31–54.
- Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5(2), 121–152.
- Clark, R. E. (1983). Reconsidering research on learning from media. Review of Educational Research, 53(4), 445–459.
- Russell, T. L. (1999). The No Significant Difference Phenomenon. North Carolina State University.
- Noetel, M., Griffith, S., Delaney, O., Sanders, T., Parker, P., del Pozo Cruz, B., & Lonsdale, C. (2021). Video improves learning in higher education: A systematic review. Review of Educational Research, 91(2), 204–236.
- Deslauriers, L., McCarty, L. S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251–19257.
- Kounios, J., & Beeman, M. (2014). The cognitive neuroscience of insight. Annual Review of Psychology, 65, 71–93.
Retrieval practice and testing effect
- Roediger, H. L., III, & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255.
- Rowland, C. A. (2014). The effect of testing versus restudy on retention: A meta-analytic review. Psychological Bulletin, 140(6), 1432–1463.
- Adesope, O. O., Trevisan, D. A., & Sundararajan, N. (2017). Rethinking the use of tests: A meta-analysis of practice testing. Review of Educational Research, 87(3), 659–701.
- Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. Science, 331(6018), 772–775.
- Agarwal, P. K., Nunes, L. D., & Blunt, J. R. (2021). Retrieval practice consistently benefits student learning: A systematic review of applied research in school classrooms. Educational Psychology Review, 33, 1409–1453.
- Pan, S. C., & Rickard, T. C. (2018). Transfer of test-enhanced learning: Meta-analytic review and synthesis. Psychological Bulletin, 144(7), 710–756.
Spacing and distributed practice
- Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354–380.
- Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science, 19(11), 1095–1102.
Interleaving
- Brunmair, M., & Richter, T. (2019). Similarity matters: A meta-analysis of interleaved learning and its moderators. Psychological Bulletin, 145(11), 1029–1052.
- Pan, S. C., Tajran, J., Lovelett, J., Osuna, J., & Rickard, T. C. (2019). Does interleaved practice enhance foreign language learning? The effects of training schedule on Spanish verb conjugation skills. Journal of Educational Psychology, 111(7), 1172–1188.
Study strategies review
- Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4–58.
Desirable difficulties
- Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing About Knowing (pp. 185–205). MIT Press.
- Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher et al. (Eds.), Psychology and the Real World (pp. 56–64). Worth.
- Soderstrom, N. C., & Bjork, R. A. (2015). Learning versus performance: An integrative review. Perspectives on Psychological Science, 10(2), 176–199.
- Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 187–194.
- Karpicke, J. D., & Roediger, H. L., III (2008). The critical importance of retrieval for learning. Science, 319(5865), 966–968.
Productive failure
- Sinha, T., & Kapur, M. (2021). When problem solving followed by instruction works: Evidence for productive failure. Review of Educational Research, 91(4), 823–861.
- Schwartz, D. L., & Martin, T. (2004). Inventing to prepare for future learning: The hidden efficiency of encouraging original student production in statistics instruction. Cognition and Instruction, 22(2), 129–184.
Expertise reversal effect
- Kalyuga, S., Chandler, P., & Sweller, J. (1998). Levels of expertise and instructional design. Human Factors, 40(1), 1–17.
- Kalyuga, S., Ayres, P., Chandler, P., & Sweller, J. (2003). The expertise reversal effect. Educational Psychologist, 38(1), 23–31.
- Tetzlaff, L., Peters, L., & Simonsmeier, B. A. (2025). The expertise reversal effect: A meta-analysis. Educational Psychology Review, 37.
Interactive simulations
- Finkelstein, N. D., Adams, W. K., Keller, C. J., Kohl, P. B., Perkins, K. K., Podolefsky, N. S., Reid, S., & LeMaster, R. (2005). When learning about the real world is better done virtually: A study of substituting computer simulations for laboratory equipment. Physical Review Special Topics — Physics Education Research, 1, 010103.
- Wieman, C. E., Adams, W. K., & Perkins, K. K. (2008). PhET: Simulations that enhance learning. Science, 322(5902), 682–683.
- Adams, W. K. (2009). Student engagement and learning with PhET interactive simulations. In Il Nuovo Cimento C, 33, 21–32.
- Podolefsky, N. S., Moore, E. B., & Perkins, K. K. (2014). Implicit scaffolding in interactive simulations: Design strategies to support multiple educational goals. Working paper, University of Colorado Boulder.
- Usmeldi (2026). Meta-analysis of PhET interactive simulations on student learning outcomes. (Aggregation of 47 effect sizes, 20 studies, N = 4,563.)
Worked examples and adaptive fading
- Renkl, A., Atkinson, R. K., Maier, U. H., & Staley, R. (2002). From example study to problem solving: Smooth transitions help learning. Journal of Experimental Education, 70(4), 293–315.
Analogies
- Gentner, D., & Gentner, D. R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & A. L. Stevens (Eds.), Mental Models (pp. 99–129). Lawrence Erlbaum.
- Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7(2), 155–170.
- Clement, J. (1993). Using bridging analogies and anchoring intuitions to deal with students’ preconceptions in physics. Journal of Research in Science Teaching, 30(10), 1241–1257.
- Spiro, R. J., Feltovich, P. J., Coulson, R. L., & Anderson, D. K. (1989). Multiple analogies for complex concepts: Antidotes for analogy-induced misconception in advanced knowledge acquisition. In S. Vosniadou & A. Ortony (Eds.), Similarity and Analogical Reasoning (pp. 498–531). Cambridge University Press.
- Duit, R. (1991). On the role of analogies and metaphors in learning science. Science Education, 75(6), 649–672.
- Taber, K. S. (2001). The mismatch between assumed prior knowledge and the learner’s conceptions: A typology of learning impediments. Educational Studies, 27(2), 159–171.
Motivation and self-determination theory
- Howard, J. L., Bureau, J., Guay, F., Chong, J. X. Y., & Ryan, R. M. (2021). Student motivation and associated outcomes: A meta-analysis from Self-Determination Theory. Perspectives on Psychological Science, 16(6), 1300–1323.
- Bureau, J. S., Howard, J. L., Chong, J. X. Y., & Guay, F. (2022). Pathways to student motivation: A meta-analysis of antecedents of autonomous and controlled motivations. Review of Educational Research, 92(4), 527–569.
- Hanus, M. D., & Fox, J. (2015). Assessing the effects of gamification in the classroom: A longitudinal study on intrinsic motivation, social comparison, satisfaction, effort, and academic performance. Computers & Education, 80, 152–161.
LLM failure modes in education
- Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, O., & Mariman, R. (2025). Generative AI can harm learning. Proceedings of the National Academy of Sciences, 122(2).
- SycEval study (2025). Sycophancy in AI tutoring interactions. Presented at FAccT 2025.
- Wang, X., & Fan, S. (2025). ChatGPT and learning outcomes: A meta-analysis. (51 studies, g = 0.867.)
AI tutoring systems and constrained architectures
- VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197–221.
- Demszky, D., Liu, J., Mancenido, Z., Cohen, J., Hill, H., Jurafsky, D., & Hashimoto, T. (2024). Can AI improve the quality of human tutoring? A randomized controlled trial with Tutor CoPilot. Proceedings of EMNLP 2024.
- MWPTutor (ETH Zurich, 2024). LLM dialogue within finite state transducers for math word problem tutoring.
- SocraticLM (2024). Fine-tuning for Socratic tutoring dialogue. NeurIPS 2024.
- Corbett, A. T., & Anderson, J. R. (1994). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4(4), 253–278.
- Pane, J. F., Griffin, B. A., McCaffrey, D. F., & Karam, R. (2014). Effectiveness of Cognitive Tutor Algebra I at scale. Educational Evaluation and Policy Analysis, 36(2), 127–144. (RAND Corporation MATHia study.)
- Falmagne, J.-C., Koppen, M., Villano, M., Doignon, J.-P., & Johannesen, L. (1990). Introduction to knowledge spaces: How to build, test, and search them. Psychological Review, 97(2), 201–224. (Foundational Knowledge Space Theory behind ALEKS.)
- Muralidharan, K., Singh, A., & Ganimian, A. J. (2019). Disrupting education? Experimental evidence on technology-aided instruction in India. American Economic Review, 109(4), 1426–1460. (Mindspark RCT.)
Teaching at the Right Level
- Banerjee, A., Banerji, R., Berry, J., Duflo, E., Kannan, H., Mukerji, S., Shotland, M., & Walton, M. (2017). From proof of concept to scalable policies: Challenges and solutions, with an application. Journal of Economic Perspectives, 31(4), 73–102.
- Pratham (multiple years). Teaching at the Right Level program evaluations. J-PAL affiliated.
- Mayer, R. E. (2009). Multimedia Learning (2nd ed.). Cambridge University Press.
- Guo, P. J., Kim, J., & Rubin, R. (2014). How video production decisions affect student engagement: An empirical study of MOOC videos. Proceedings of ACM L@S 2014, 41–50.
- Szpunar, K. K., Khan, N. Y., & Schacter, D. L. (2013). Interpolated memory tests reduce mind wandering and improve learning of online lectures. Proceedings of the National Academy of Sciences, 110(16), 6313–6317.