Note I’ve liberally repurposed some quotes from the book in my notes to self, which I then repurposed here. No plagiarism is intended. Raising an eyebrow at the description, I approached this book cautiously, unconvinced by the “algorithmic thinking” craze in education and fully unenthused by the prospect of teaching small children to program a “turtle” to draw shapes on a screen. (For some context, I was underwhelmed with my own experiences in LOGO, and was generally wary of another nonfiction book wherein the author hits the reader repeatedly with his sledgehammer of a thesis without regard to objectivity or good epistemic practice.) TL;DR, Papert argues for an educational paradigm shift that can be catalyzed [only] by computers. His thesis is actually twofold: (1) learning happens through the improvement of our mental models and (2) computers are the best/a very good way to mediate this kind of learning. Everything Papert went on to say about (1) resonated with me; what he said about (2) had either already come true, or I was (initially) quite skeptical about. Yet though Mindstorms was published in 1980, the accuracy of some of its predictions lent the rest of his reasoning much credibility. [For instance, he spends much time defending ideas like: programming need not be a recondite discipline; computers would catalyze the emergence of new ideas; computers would carry these ideas into a world larger than a research lab (e.g. via the ubiquitousness of today’s Internet).] So I read on.

I. He talks about learning in general. Epistemology is the theory of knowledge. Usually, the term describes the study of the conditions of validity of knowledge. Here though, Papert talks of Piaget’s epistemology, concerned not with the validity of knowledge but rather with its origin and growth – what he terms “genetic epistemology.” Basically, the claim is that people have a collection of models in their heads. These models/heuristics constitute what they know about the world. Accordingly, learning anything is easy if one can assimilate it to their collection of models. It further follows that what an individual can learn (and how he learns it) depends on what models and real-world data they have available. Papert argues for more “Piagetian learning” in schools, optimizing for conditions under which new models can take root. Educators should understand the nature of this “natural” learning. It notably does not mean regurgitating information, or any kind of tabula rasa/teacher-filling-empty-minds-of-students model. These natural learning paths include “false” theories. New and old knowledge sometimes contradict, and effective learning requires strategies to deal with this conflict. That is, sometimes we encounter data inconsistent with our expectations, or when our intuition fails us. In these situations we need to improve our intuition. Education is about learning to improve this intuition/mental model collection. Sometimes the conflicting pieces of knowledge can be reconciled, sometimes one or the other must be abandoned, and sometimes the two can both be safely kept around in separate mental compartments – and all this is normal. For instance, in intuitive geometry, a straight line is not necessarily the shortest distance between two points, and walking slowly between two points does not necessarily take more time than walking fast. It is not merely an “item” of knowledge that is missing, but rather an epistemological presupposition underlying the idea of “shortest” as a property of the path rather than of the action of traversing it. In traditional schools, though, children are being force-fed “correct” theories well before they are ready to invent them, before their intuition says anything at all, and well before they care about the question the facts are addressing. After all, it’s easy to take truth for granted (in a “well, that obvious” way) without having had to derive it in the first place. For instance, natural selection seems “obvious” when it’s taught in an introductory biology class, but many very smart people didn’t believe it back in the 1800s, and nobody verbalized before Darwin either. Or how about when Descartes invented his grid? I don’t think about coordinates non-Cartesianly anymore, but even this was apparently once unintuitive enough that it had to be invented. (I’ll come back to this point in a bit.) It’s also worth noting that timescales of this learning are very hard to measure. In particular, there are experiences we have that have disproportionately large or far-reaching consequences, but only many years later. At the end of the day, an educator ought to remember that what they see is not the learning itself; they can never access the full picture. What’s going on in students’ minds is often hard to access. Students need practice becoming aware of and communicating their thought process. After all, the root of “education” is Latin’s ēdūcere – to draw out the existing knowledge (and models) in children’s heads (as opposed to “teaching how to think” per se – students already do this naturally!). Yet in a system centered around test results and measurable outcomes, Piagetian learning is all but ignored.

II. He speaks of the social environments/contexts of learning. How we think about knowledge affects how we think about ourselves. Students are exposed to a range of (potentially arbitrary) labels: STEM/humanities; smart/dumb; freshman/senior. People who believe they are “good at X” and [therefore] “bad at Y” may then view Y as foreign and “other”. These students self-report “making their head go blank” to memorize Y. In doing so, they encode a factoid in isolation, missing out on potential connections. Yet to learn something, one must 1) relate new thing to something they already know and then 2) make the new thing their own. Imagine learning a foreign language by only memorizing a random list of vocabulary without building sentences or conversing! How pointless that seems, and how transient the knowledge. And to draw links between things, they must seem meaningful instead of arbitrary. There’s an overall lack of genuineness in traditional schooling. Why learn the parts of speech in elementary school? The distinction is pretty pointless, unless, for instance, you’re going to try to make a program produce reasonable sentences. The reasons must be real. When a teacher tells a student that the reason for those many hours of arithmetic is to check change or calculate tip, or that “math is used in all jobs”, that’s ridiculous. It’s just another instance of that unnecessary dishonesty in the educational relationship (along the lines of “let’s do that together” when the teacher already knows the answer). Discovery cannot be a setup; invention cannot be scheduled. The flow of ideas should not be one way street. How long I waited for “growing up”, only to find that real-world adults (or researchers!) didn’t really know better, and were nearly as confused as the rest of us. It’s worth noting that “genuine” doesn’t have to mean “real-world”. For some, the game is scoring grades; for others it is outsmarting the system. For many, school math is enjoyable in its repetitiveness, precisely because it is so mindless and dissociated. But just because people can find meaning in intrinsic dullness is not a reason to avoid improving. Papert claims a good learning environment is where real, socially cohesive, and where experts and novices are all learning together. Learning should not feel compartmentalized or arbitrarily partitioned, and “in-school” time should be as enjoyable as “out of school” time (e.g. clubs, the things people choose to work on on their own). As a final story, imagine that children were forced to spend an hour a day drawing dance steps on squared paper and had to pass tests in these “dance facts” before they were allowed to dance physically. Would we not expect the world to be full of “dance-phobes”? Would we say that those who made it to the dance floor and music had the greatest “aptitude for dance”? It is no more appropriate to draw conclusions about mathematical aptitude from children’s unwillingness to spend hundreds of hours doing sums.

III. Given (I) and (II), the traditional school system/education is in a bad spot. Papert forewarns that the human-computer interface needs to be implemented with care and intent, lest historical accidents lead to strange side effects. That is, in developing a new system/technology, it’s worth putting some time into making sure it’s actually doing what’s intended before wider implementation. For instance, BASIC is a lot less readable than Python, and if it became the standard (as it was for some years), programming might look very different today. He also talks about how QWERTY sucks (though maybe this is an urban legend), and how humanity was a bit hasty during the Industrial Revolution. Education is no different. School is a set of historical accidents. A committee of 10 people decided the standard curriculum; it’s said that we often learn science in the order “biology, chemistry, physics” only because these were listed in alphabetical order. Likewise, a major factor that determined what math went into the standards was what could be done in a classroom with pencil and paper (e.g. I’d agree graphing parabolas is not particularly fundamental to understanding math). To avoid this, Papert advocates identifying for every subject X the difference between “school X”, “proto X” (knowledge about X presupposed by school X), and “missing X” (what students should understand about X that is not in school X). He notes that education should probably be rethought entirely; the car was not made by gradually trying to improve the horse and carriage. Only looking at what already exists is insufficient. Not only is school bad, but research to improve it also in a rough spot. There is no recognized place in academia for e.g. people whose research is really physics, but in educationally meaningful directions. Such people are not particularly welcome in a physics department, as their education goals trivialize their work in the eyes of other physicists. Nor are they welcome in the education school, where their highly technical language is not understood and their research criteria are out of step. These hypothetical physicists will see their work very differently, as a theoretical contribution to physics that in the long run will make knowledge of the physical universe more accessible, but which in the short run would not be expected to improve performance of students in a physics course. The concept of a serious enterprise of making science for the people was, at the time of writing, quite alien. (And perhaps still is. That makes me very sad – for once, research that would interest me! but apparently nobody’s hiring.)

IV. Used intentionally, computers are a very good way to improve the learning situation. Okay, I was on board so far. But Papert argues that the computer in particular is a likely panacea. Computers, he argue, cross cultural barriers and make scientific knowledge intimately part of individuals’ lives, personalizing otherwise obscure facts. Initially, I viewed his comments akin to how famous physicists fell in love with radio sets or cars. Because of the computer’s simulation capabilities, he considers them universal vectors for cultural seeds, and cultural assimilation is inculcation of a way fo thinking.. (Perhaps he foretells the Internet.) Children appropriate all the things in their environments (e.g. the models cherished, the metaphors and connections drawn) to build their own, and when the computer becomes ubiquitous, children will have access to better data for better models. His argument became convincing with the following line: “in teaching the computer how to think, children embark on an exploration about how they themselves think…Thinking about thinking turns the child into an epistemologist”. He continues drawing more connections between good learning and programming: debugging gives students a growth mindset (turning the dichotomy from “right/wrong” into “fixable/not”), and also forces students to verbalize what exactly the next step is or should be. (That is, getting a computer to do something requires the underlying process be described with enough precision to be carried out by the machine.) Students may learn to have the discipline to think before mindlessly calculating (pseudocoding, at least to some extent, before typing). Even if computers are not the only way to learn this skill, I admit it’s a pretty transparent and accessible way to start being articulate about debugging strategies. As learners become experts in any field, they have not just the object-level facts, but the connections/network between them. Papert speaks of how “expert learners” use certain metaphors to talk about important learning experiences. They talk about “getting to know” an idea, “exploring” a field, and acquiring sensitivity to distinctions that seemed ungraspably subtle just a moment ago. That is, learning about developing aesthetic and taste! But to do that, one needs many examples to “machine learn” off of. Computers can provide those simulation worlds, giving children the relevant data/training set. But computers don’t just help by simulating. Papert believes the computer is much more than a tool for pre-programmed instruction – and thus fundamentally different from the fuss the invention of the radio or TV created in education. Instead, he says that its importance is computing culture and computational thinking. Computers facilitate the Piagetian learning that takes places as a child grows up. But “teaching without curriculum” does not mean spontaneous, freeform classrooms or simply leaving a child alone. In this model, educational intervention means supporting children as they build their own intellectual structures with materials drawn from the surrounding culture, a culture educators can add constructive elements to and eliminate noxious ones from. (That is, educators ought to feed the student-evidentialist good data.) He adds that the vocabulary CS introduced is a key part of its culture. In general, people need more structured ways to talk and think about the learning of skills. Many scientific and mathematical advances have served a similar linguistic function by giving us words and concepts (models) to describe what had previously seemed too amorphous for systematic thought. Why is it, he asks, that children are unable to systematically and accurately list all the possible combinations of colored beads until 5th or 6th grade [citation needed]? (This was shocking.) He claims this is because there was no commonly used vocabulary for things like “bug”, “nested loops”, or “double-counting”. Our culture, he claims, is poor in models of systematic procedures. With computers, children can learn to be systematic before they learn to be quantitative. [I’m a fan of the argument that vocabulary influences thinking (weak Sapir-Whorf). Much of the value in reading TFaS, for instance, was getting a library of labels for cognitive biases. CFAR vocabulary (“debugging”) is suggestive of the impacts CS has had on “rationality”. But can you acquire such vocabulary through some metacognatively rich approach that doesn’t so heavily rely on computers? I’m not against computers per se – just unconvinced they’re a necessary or even optimal ingredient).] So is learning systematic/”algorithmic” thinking the only way forward? No. While curriculum reformers are often concerned about making the choice between learning strategies X and Y choice from above and building it into the curriculum, what Papert hopes for is for learners to learn how to make that choice for themselves. He considers algorithmic thinking a tool among many, and wants learners to become expert in recognizing and choosing among varying styles of thought. No knowledge is entirely reducible to words, and no knowledge is entirely ineffable – having the vocabulary for this mode of thinking isn’t a panacea after all. But still, an important part of becoming a good learner is learning how to push out the frontier of what we can express with words.

V. Lingering thoughts/concerns/questions (or, inchoate opinion) Throughout, I wondered about other ways to introduce algorithmic thinking. Math was an obvious one. But I also wondered how much of my anti-algorithmic-thinking view was just viewing CS as bashy and math as elegant – after all, whenever Papert spoke of implementation via math instead of CS, I had no issues. Am I just biased against CS? Why? I don’t even mind casework – in math, at least… I also wasn’t convinced by the claims that students who learn CS will learn to favor modularity. A working program can certainly be bashy, and I’m not convinced people will clean it up by default. At SSP, students produced some nasty, convoluted code to avoid learning how to e.g. write a for loop. (That is, S1 tries to minimize effort, even if the process ends up taking longer.) I grudgingly agree this is fine insofar as students will always produce stuff they understand instead of regurgitating things they don’t, but I’m not sure they’ll push themselves to do it better. Papert depicts what I agree an optimal learning situation looks like, and I agree that subdividing problems into simpler steps is a good metacognitive technique that CS might, in the right circumstances, promote. But how does theory translate to practice? How should educators implement these ideas? Ah well, I suppose that’s beyond the scope of his book. Papert’s computer “microwords” and simulations are artificial – that is, deliberately invented – Piagetian material. Indeed, they function as carriers of powerful ideas for learners, separating the powerful big ideas from their inaccessible formalisms. His microworlds are stripped of complexity and is graspable. Debugging is most effective when the modules are small enough for it to be unlikely that any one contains more than one bug. Skills and discrete facts are easy to teach and learn one at a time. [E.g. it’s easy to teach people to associate “protein” with “amino acid”, but hard to give them the whole network of knowledge without throwing the (not-necessarily-proverbial) textbook at them.] In some ways, this feels similar to replacing Shakespeare with simplified text. Do I agree with this technique in general? Not sure. (My literature sensibilities scream no, but I admit I do this when teaching biology and chemistry.) Distilling something to its core & stripping away all the exceptions makes the inaccessible (to the point of arcane, really) enjoyable for a larger audience, which seems at the very least like a reasonable entry point. A big concern throughout was this notion of “over-scaffolding”, or breaking things down for learners too much. In coming up with the perfect analogy or model for a learner, I’m doing the cognitive lifting, leaving them only the bite-sized, standards-focused, overly-predigested pieces. Am I oversimplifying? Do students just chalk the existence of such models to magic? – Then again, is that not what we do with real world phenomena? At some layer, perhaps it gets axiomatic… Maybe more struggle would be better: there’s benefit in things that are just hard. I think [some] SSP students build character (or at least learn something valuable) in their struggles. And what if students break the concept down into small parts and understand the units but never chunk upward? – if they understand each line without seeing the bigger picture. How do you make students generalize reflect upon what they’ve learned? Is it really as simple as feeding evidentialists good evidence? Are people even good evidentialists to begin with?! Do people really strive to be logically consistent by default?? Is this what Papert assumes?

Well. I suppose some of this depends on what the learning goal is. In general, scaffolds should support the goal and clear away unnecessary underbrush. Thus, it’s worth keeping in mind that the point isn’t teaching the content, but rather improving students’ metacognitive skills. This is why I can’t make learning e.g. astrophysics too easy for SSPers. This way, students can practice their own meta-skills and figure out how to learn better themselves, in situations beyond the models given in the classroom.