Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Participants: Derya Akbaba * Ben Allen * Natalia-Rozalia Avlona * Kirill Azernyi * Erin Kathleen Bahl * Natasha Bajc * Lucas Bang * Tully Barnett * Ivette Bayo * Eamonn Bell * John Bell * kiki benzon * Liat Berdugo * Kathi Berens * David Berry * Jeffrey Binder * Philip Borenstein * Gregory Bringman * Sophia Brueckner * Iris Bull * Zara Burton * Evan Buswell * Ashleigh Cassemere-Stanfield * Brooke Cheng* Alm Chung * Jordan Clapper * Lia Coleman * Imani Cooper * David Cuartielles * Edward de Jong * Pierre Depaz * James Dobson * Quinn Dombrowski * Amanda Du Preez * Tristan Espinoza * Emily Esten * Meredith Finkelstein * Caitlin Fisher * Luke Fischbeck * Leonardo Flores * Laura Foster * Federica Frabetti * Jorge Franco * Dargan Frierson * Arianna Gass * Marshall Gillson * Jan Grant * Rosi Grillmair * Ben Grosser * E.L. (Eloisa) Guerrero * Yan Guo * Saksham Gupta * Juan Gutierrez * Gottfried Haider * Nabil Hassein * Chengbo He * Brian Heim * Alexis Herrera * Paul Hertz * shawné michaelain holloway * Stefka Hristova * Simon Hutchinson * Mai Ibrahim * Bryce Jackson * Matt James * Joey Jones * Masood Kamandy * Steve Klabnik * Goda Klumbyte * Rebecca Koeser * achim koh * Julia Kott * James Larkby-Lahet * Milton Laufer * Ryan Leach * Clarissa Lee * Zizi Li * Lilian Liang * Keara Lightning * Chris Lindgren * Xiao Liu * Paloma Lopez * Tina Lumbis * Ana Malagon * Allie Martin * Angelica Martinez * Alex McLean * Chandler McWilliams * Sedaghat Payam Mehdy * Chelsea Miya * Uttamasha Monjoree * Nick Montfort * Stephanie Morillo * Ronald Morrison * Anna Nacher * Maxwell Neely-Cohen * Gutierrez Nicholaus * David Nunez * Jooyoung Oh * Mace Ojala * Alexi Orchard * Steven Oscherwitz * Bomani Oseni McClendon * Kirsten Ostherr * Julia Polyck-O'Neill * Andrew Plotkin * Preeti Raghunath * Nupoor Ranade * Neha Ravella * Amit Ray * David Rieder * Omar Rizwan * Barry Rountree * Jamal Russell * Andy Rutkowski * samara sallam * Mark Sample * Zehra Sayed * Kalila Shapiro * Renee Shelby * Po-Jen Shih * Nick Silcox * Patricia Silva * Lyle Skains * Winnie Soon * Claire Stanford * Samara Hayley Steele * Morillo Stephanie * Brasanac Tea * Denise Thwaites * Yiyu Tian * Lesia Tkacz * Fereshteh Toosi * Alejandra Trejo Rodriguez * Álvaro Triana * Job van der Zwan * Frances Van Scoy * Dan Verständig * Roshan Vid * Yohanna Waliya * Sam Walkow * Kuan Wang * Laurie Waxman * Jacque Wernimont * Jessica Westbrook * Zach Whalen * Shelby Wilson * Avery J. Wiscomb * Grant Wythoff * Cy X * Hamed Yaghoobian * Katherine Ye * Jia Yu * Nikoleta Zampaki * Bret Zawilski * Jared Zeiders * Kevin Zhang * Jessica Zhou * Shuxuan Zhou

Guests: Kayla Adams * Sophia Beall * Daisy Bell * Hope Carpenter * Dimitrios Chavouzis * Esha Chekuri * Tucker Craig * Alec Fisher * Abigail Floyd * Thomas Forman * Emily Fuesler * Luke Greenwood * Jose Guaraco * Angelina Gurrola * Chandler Guzman * Max Li * Dede Louis * Caroline Macaulay * Natasha Mandi * Joseph Masters * Madeleine Page * Mahira Raihan * Emily Redler * Samuel Slattery * Lucy Smith * Tim Smith * Danielle Takahashi * Jarman Taylor * Alto Tutar * Savanna Vest * Ariana Wasret * Kristin Wong * Helen Yang * Katherine Yang * Renee Ye * Kris Yuan * Mei Zhang
Coordinated by Mark Marino (USC), Jeremy Douglass (UCSB), and Zach Mann (USC). Sponsored by the Humanities and Critical Code Studies Lab (USC), and the Digital Arts and Humanities Commons (UCSB).

Code Critique: What Python does when it's doing nothing.

Code Critique: nothing.py

Language: Python

Authors: n/a

Years: 1990-

Works cited: Frings et al., Massively Parallel Loading, ICS 2013. (pdf)

TL;DR Extracting meaning from code requires understanding (modeling) what is left unsaid.


The code I'll be critiquing is an empty python script, perhaps the computer science equivalent of 4'33''.

Creating the code is trivial on any *nix machine:
$ python --version
Python 2.7.17
$ touch nothing.py
$ ls -l
total 0
-rw-r--r-- 1 rountree rountree 0 Jan 18 14:06 nothing.py
$

touch creates and empty file with the specified filename, and ls -l shows the size of the file is indeed 0. Running this code is about as exciting as you might expect.

$ python ./nothing.py
$

The temptation is to think that nothing happened: we didn't ask Python to do anything, and it complied with alacrity. In most computing environments, that summary is a perfectly serviceable abstraction of what happened. But it is an abstraction, and interesting things happen when ground truth starts diverging too far from our abstraction of it.

Let's take a closer look.

$ strace python ./nothing.py 
execve("/usr/bin/python", ["python", "./nothing.py"], 0x7ffc76d71d08 /* 56 vars */) = 0
brk(NULL)                               = 0x55a94fcaf000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
....

strace captures all of the calls that the Python interpreter makes into the linux kernel (e.g., brk), lists the parameters for each of those calls (NULL) and the value that's returned (0x55a94fcaf000), possibly with an explanation if there's an error (-1 ENOENT (No such file or directory)). What each of those individual lines mean to a systems programmer is beyond the scope of this critique. The aggregate numbers are more interesting.

$ strace python ./nothing.py 2>&1 | wc -l
746

The 2>&1 is a bit of magic to put output routed to stderr (which strace uses) to stdout (which the rest of the tools expect). The pipe | routes the stdout of the prior command to the stdin of the next command. wc -l simply counts how many lines there are in stdin and outputs that number.

Roughly speaking, in order to run nothing.py, our Python interpreter made 746-ish calls into the linux kernel.

Which seems a little excessive.

Through the magic of bash one-liners we can get a sense of what's going on.

$ strace python ./nothing.py 2>&1 | cut -d "(" -f 1 | sort | uniq -c | sort -rn | head
234 openat
 99 fstat
 98 read
 87 stat
 68 rt_sigaction
 66 close
 25 mmap
 14 mprotect
  8 brk
  8 access
 [only the top 10 shown]

That's 234 distinct attempts to open files. On a laptop, that happens so fast as to be unnoticeable. On a supercomputer, however....

KULL [15] is a large multi-physics application developed at LLNL. When it was first run on Dawn, an IBM BlueGene/P system, its start-up exposed serious scaling challenges and significantly disrupted the entire computing center. The KULL executable was dynamically linked to about one thousand shared libraries, most of them staged in an NFS file system to ease maintenance. Loading these libraries during program start-up scaled extremely poorly. The time from job launch to the initial invocation of KULL’s main function, for instance, took about one hour at 2,048 MPI processes and ten hours at 16,384 processes. Further, during start-up, the NFS server was overwhelmed with load requests from KULL, and other users and jobs were unable to connect to the NFS while KULL start-up continued. [Frings 2013, emphasis in original]

The problem was that both the python libraries and those used by KULL itself were all located on single filesystem that the compute nodes accessed over the network. 16,384 processes starting up and simultaneously demanding 1,000+ libraries each utterly overwhelmed the filesystem. The fix was clever enough to be published (distribute libraries via a tree network, see the paper for details), but doesn't inform this critique.

And with that background in place we can now do a code critique.


So what does this code --- or lack of code --- actually mean?

Trawling through the Python source code to understand its initialization process would be tedious and error-prone. It's much faster to run Python under instrumentation (in this case, strace) and watch what it does. The empty script guaranteed that all the activity we'd be seeing would be coming from Python, not our code.

Ok, that's what you did, but you still haven't told me what it means.

Three points.

First, the text of code is an unreliable guide to its performance. There may be a strong guarantee as to correct behavior, but there are rarely any guarantees about how that behavior will be accomplished. The behavior of code is generally more important than its text, but that behavior is contingent upon everything from the rest of the computing ecosystem to the society in which the code operates.

Second, there might be a temptation to claim that if we had enough information about the system we could have perfect knowledge of the code's behavior. We're never going to be able to get that much information out of nontrivial systems, so we're left with using simplified models. As Box remarked: "All models are wrong, but some are useful." Becoming skilled in the art of system design, programming or code critiquing requires figuring out where your model can simplify without doing too much violence to your results.

Third, CCS understandably puts a great deal of emphasis on text. This example is a corner case of how that approach can break down. Unfortunately, learning how to do code instrumentation is at least as difficult as learning how to code in the first place, and there's definitely a lack of friendly guides to performance analysis. One way around the problem might be more collaboration.


Questions for discussion:

A. What examples of implied, misleading or ironic code have you come across in your own work?

B. What tools do you use to compare code-as-text versus code-as-performance?

C. How should CCS handle implied, misleading or ironic code?


Notes: I generated the example using Python 2.7.17 running on a Linux 5.0.0-37 kernel provided by Ubuntu 18.04.3 (Bionic Beaver). Running Python 3.6.9 on that same machine only generates 67 calls to openat.

OSX users can substitute dtruss for strace, but doing so requires running as root and disabling system integrity protection.

Comments

  • Hi Barry -- thanks so much for this! Very thought-provoking, and point well taken on the limitations of text-first approaches under some conditions of investigations.

    This is so timely -- 2.7 is officially retired and unsupported as of this year, yet still very widely deployed.

    It might be worth specifying "Python 2" as the language up top before the final footnote, and/or throwing a python -version into one of your console logs in the critique above, as this could be clarifying for people trying to follow along on different systems. Maybe also worth letting the MacOS users know that their default python is likewise 2.7, for those who don't.

    Interesting that Python 3 only generates 67 calls, not 746 -- I wonder what directions in the evolution of the language may have led to this new concision (rather than it bulking up over time).

    I also wonder how launching the interactive shell in verbose mode (python -v) -- and what it has to say, verbosely -- is related (or not) to what you are investigating here with empty execution?

    One other question about this "do nothing" / null input investigation. Are there related empty file / empty input genres of computing culture, not unlike quines or code golf? I can think of the question "what counts as an valid minimal / empty program?" -- which has an interesting Rosetta Code page with 237 languages listed...

    https://www.rosettacode.org/wiki/Empty_program

    ...but I'm trying to recall if there are other code practices -- debugging or culture jamming -- centered on emptiness and absence.

  • edited January 2020

    There are some, @jeremydouglass. I have a 12-page technical report about null programs from 2013:

    “No Code: Null Programs.”

    It covers a 0-byte demo (in the demoscene sense) and a 0-byte quine that won IOCCC (International Obfuscated C Code Contest) for being the shortest quine (a program that produces its own output).

  • Thanks Nick -- that's exactly what I was trying to recall. I remember us discussing that report during CCS-WG 2014.

  • This reminds me of Pall Thayer's Microcode work Sleep. Modeled on Warhol's film, it's a program that does nothing for eight hours; but of course background processes are active throughout.

    Zero-byte code is kind of an obsession of mine. I made a programming language and file system using only folders because Windows reports them as taking up zero bytes. All temp files created by the interpreter are folders, all data storage is in the names of folders. This way, even a modest hard-drive is infinitely large...

    Also, there is a program smaller than one with zero bytes, the file that does not exist. In the programming language Unnecessary, this is the only valid program. Even the existence of an empty file is considered excessive. The only program (the one that doesn't exist) prints its own source code to the screen. Since there is no source code, it prints nothing, making it a variation of the null-quine explained in Nick's paper.

  • @jeremydouglass, @nickm : The idea of null programs are interesting in their own right, but I think I may have buried my lede in my description. The larger point is that there's a substantial amount of code being executed, from libraries to systems calls to firmware, and the totality of that forms the environment in which the particular program of interest has to operate. Null programs are one way of moving that background execution to the foreground, whether by using strace, hardware performance counters, or even something as simple as top.

    The difficulty, particularly for this group, is that system software isn't something self-taught programmers are usually exposed to, and even most CS students are happy to focus on their particular APIs for their preferred language and assume the rest of the programming environment Just Works. The folks to do high performance computing are more likely to be aware of the system software stack, if only because they're pushing the machine to its limits as a matter of course.

    Anyway, thanks for the comments. Bill and I are writing this up as a case history for an upcoming DH paper, and you've got me thinking how I can rearrange the presentation. I'll add the versioning information as you suggested.

  • @Temkin I had no idea zero-byte programs were thing. Thanks for sharing that.

  • C. How should CCS handle implied, misleading or ironic code?

    How does other criticism handle it? (This is a genuine question, not a rhetorical one.)

    A. What examples of implied, misleading or ironic code have you come across in your own work?

    I think there are a handful of categories that ironic code I've seen falls into - (excluding just a dry sense of humour in the comments). This isn't necessarily an exhaustive list:

    Accidental irony

    The first, often amusing, arises due to the inevitable entropy that creeps in when editing code: it begins to drift from the behaviour outlined in the (originally well-meaning) comments. (Source citation elided to protect the guilty here.)

        /* don't forget to count this additional object */
        obj_count --;
    

    (Originally, a counter was bumped at this point.)

    There's no way - other than careful code review - to check the applicability of human-readable language in comments.

    Malicious misstatement

    The second can arise by accident, perhaps - where code has unintended behaviour as a direct consequence of a 'typo'. A famous - and deliberate - example of this is:

    if ((options == (__WCLONE|__WALL)) && (current->uid = 0))
            retval = -EINVAL;
    

    (an explanation of this can be found here.)

    There are other ways to deliberately obscure the meaning of code. One might misuse formatting so as to present a single statement as multiple ones (or vice versa). Many of the tricks of code golf would have applications here.

    I'm not sure if there exists a direct analogy to this in other texts. You might consider the "technically I didn't lie" kind of mendacity so often found in political public statements as an example of this; perhaps there are other examples.

    Horrendous lack of taste

    I've seen the trick invented more than once: overload unary minus and the less-than operator (in languages that permit it) to construct a "backwards pointer", <-. This kind of trickery may be well-intentioned but it can only be described as unutterably awful. More to the point, what it looks like it's doing is nothing like what it's actually doing.

    Non-locality giving rise to surprise

    A final category arises when code is oddly surprising - specifically due to non-locality of reference. There's a question in the "hacker test" (copy here) that alludes to an actual historical event (or, perhaps, an urban legend - it's hard to tell):

     Ever change the value of 4?
     ... Unintentionally?
     ... In a language other than Fortran?
    

    A modern equivalent to this sort of thing arises with monkey-patching, or the use of aspect-orientation. We look for meaning near the code we're examining, typically. When behaviour is mutable "at a distance" it's much harder to get a local view; a reader of code needs a more holistic approach. Ruby's model, as interesting as it is, leaves everything open for mutation (see here for some examples and a discussion). This behaviour doesn't just make type-checking hard; it makes code comprehension difficult on the part of any reader. (This is certainly the case for a mundane reading by a programmer.)

    More well-meaning examples can be found, for example, in the Python eventlet library. Here standard library functions are replaced with near-equivalents (that have close to the same functionality in this case, thank goodness). Ironic? Technically, I suppose. Misleading? In this case, not by intention, although the technique, like many others, could be misapplied for impish purpose.

  • I didn't mean to drive the conversation away from your main point, @barry.rountree, which is a good one. I was just answering @jeremydouglass’s last question.

    Another technical report I wrote, this one for CCSWG specifically, was actually more of an attempt to wander toward your main point. In 'The Trivial Program “yes”' I discuss a program (a GNU core utility) that does very little, but which has source-code “overhead.” Of course, I wasn’t focused on what is happening with calls to the kernel, which is interesting to investigate here in the context of the Python interpreter. I was looking at the static (source code) overhead. And, I was trying to read a program that did very little, but not nothing.

  • Great examples and references!

    One extreme of “misleading” code when it’s intentionally obfuscated, and not written in a truly “malicious” way — obfuscation can of course be done for aesthetic/poetic reasons or just for fun. The work in the IOCCC provides many examples, including many classics. Michael Mateas and I wrote about this, along with the esoteric languages that are @Temkin’s expertise, in our 2005 Digital Arts and Culture paper “A Box, Darkly: Obfuscation, Weird Languages, and Code Aesthetics.”

  • @jang

    How does other criticism handle [implied, misleading, ironic code]? (This is a genuine question, not a rhetorical one.)

    Speaking first to implied code: Reaching back to my theater degree, Bernard Beckerman's approach was to encapsulate everything leading up to the performance into the concept of "precipitating context." That captures everything implicit leading up to the performance: from the architecture of the theater to the expectations of the audience. If you're studying the text of a play, it's helpful (and often necessary) to have a handle on that context in order to get at both the author's intent (if you think that's important) and your hypothetical audiences' expectations.

    Actor-network theory takes a much more extreme position where everything can be examined by its location and interactions with other things/concepts/stuff in the network. This is coming out of sociology, not theater, which allows it to be a lot more expansive. Large chunks of this "network of everything" can be "punctualized" (awful, awful term) to a single network node: if understanding the circuit board and the chemistry of LCDs isn't useful for my analysis, I can wrap those up into a new node I'll call "monitor" and just work with the abstraction instead. Thus, the implicit bits of the network are punctualized, and can be unpunctualized as needed.

    Nonmaterial Performance (which isn't yet a thing, but we're trying) emphasizes that code performs, but also that performance occurs within a network of other performing code (among other things like processor architecture and funding agencies). One way of thinking about the null program I presented is asking "Who is the audience, who is the performer?". Roughly speaking, the python interpreter is performing for the user, but is also an audience for the performance of the kernel. The kernel is in turn an audience for the performance of the firmware in the hard drives and monitor. By focusing on code-as-performance I think it's a little easier to account for implicit code (and user assumptions, and hardware limitation). Code-as-text, much like plays-as-text (and algorithms-as-math) tends to assume we all share the same idea of the generic environment where the text will be performed. That can work well, of course, but the more machine-specific or task-specific the code, the more the performance dominates the text.

    There's no way - other than careful code review - to check the applicability of human-readable language in comments.

    Yes.

    Here's my standard example of ironic code: FIRESTARTER. It's a processor stress test (or "power virus," if you like). Basically, it's a pile of carefully handcrafted vector instructions that spin in a tight loop calculating nonsense and discarding the results. The intent of the code, of course, is to maximize the power draw of the processor (and that's perfectly clear from the comments). It's a great example of having to understand the context of the code (which instructions/inputs draw the most power and why) to understand what the code is doing.

    (an explanation of this can be found here.)

    Thank you! That was exactly the one I had in mind, but I hadn't bother trying to track it down yet. I remember when it came out: it certainly made an impression.

    Thanks for the great comments.

  • @nickm

    I'll see your /usr/bin/yes and raise you a /bin/true. It's also useful in scripts and does even less than yes. What makes it interesting (to me at least) is how overengineered the GNU version has become over the years: true.c. There are four header files, calls to setlocale() and textdomain(), mixed use of fputs() and printf().... I wouldn't use any of that if I was writing my own implementation, but it's a fascinating snapshot of how the GNU folks think about software engineering and their tools environment.

    Whether or not a critique of that apparatus gets you to the point where you can better understand the culture that gave rise to the Free Software movement is an open question. There are probably easier ways of doing that. Perhaps that analysis falls closer to software studies or simply history of computer. Dunno.

  • I love that list of empty programs. Now I know which languages consider zero-byte files valid code

  • Returning to @barry.rountree's original question about what to make from all of this extratextual activity of the program...

    What I find fascinating about all this is that there are very close to zero programs which don't have implicit code; code which is executed but is not represented. Even when we go back to the earliest programming languages, Grace Hopper's A-* series of programming languages, running on the bare metal with no operating system support, there was still a required implicit organization of the system memory that, I think, would cause a zero-byte input to compile to an output with content. Difficult to tell without going back to the source code (and if anyone wants that interesting task, it's in the Stanford library). Even if it does compile to a zero byte code, I doubt very much that the proper procedure to run it would be to load a tape with no data. Operating these machines was itself often a task of loading one thing in order to load another. Even in the case of the earliest programs written in the modern style, running on the function tables of the modified ENIAC, the plugging of the ENIAC itself was a prerequisite.

    What's even more interesting, IMO, is that often this implicit code is needed in order to do things that are faux pas in code, proper. One might wonder, for example, why processors are still built with the ability to self-modify code, when (1) it's been considered extremely bad form for code to self-modify for at least 50 years now, and (2) the key requirement for self-modification—the identity of the program memory with the state memory—has been known to be a key bottleneck in computing speed for just as long. But the answer is in this implicitly executed code. Every time you load a program, you are in effect running self-modifying code; what is data—the file on disk or in memory—is loaded into the memory and then executed. This free transfer between data and code is what enables programs to have individuality within a running computer environment.

    I would argue, then, that the valence of the implicitness of this code is not so neutral as the word implicit might imply. In fact, this extratextual activity is deliberately hidden, precisely in order to create the space in which the textuality of the code can come into being. So this fits, IMO, with a view of code whereby code becomes code only through hiding the interdependence of code and state. But I hadn't considered this case of the empty program before, so thanks for this wonderful example.

  • @ebuswell said:
    In fact, this extratextual activity is deliberately hidden, precisely in order to create the space in which the textuality of the code can come into being.

    There's something a little sad about the runtime environment getting everything ready when we start nothing.py. It initializes the stack and heap, prepares garbage collection, loads libraries it thinks will be needed, and whatever else is happening in that list of calls at the top of the page. Then the big moment arrives and it ends as soon as it starts.

  • @ebuswell said:
    ...Even if it does compile to a zero byte code, I doubt very much that the proper procedure to run it would be to load a tape with no data. Operating these machines was itself often a task of loading one thing in order to load another....
    ...in order to create the space in which the textuality of the code can come into being...

    this to me feels very similar to other cognitive tools. e.g. take a day planner. the layout of the pages facilitates organization of the information inside it. as long as a user abides by the planners model when inputting appointments, it will accurately reflect a schedule. thus, the user doesnt have to maintain moment to moment awareness of their schedule, but rather can trust the system they have established in relationship to the planner to track it for them.

    a compiler feels like a cognitive tool in the same way, but tuned to the task of communicating with a microprocessor, a task that has been getting more and more abstract since before punchcards. a lot of that cognitive system used to be tasked to the human -- organizing the cards and loading the tapes in the right order -- but it seems like it is increasingly being stuffed in the layer of abstraction. the compiler expends resources trying to guess what you want, which is convenient only when it guesses right.

    @Temkin said:
    There's something a little sad about the runtime environment getting everything ready when we start nothing.py.

    i also notice myself starting to anthropomorphize the compiler. i wonder what attributes we assign to our compilers. what kinds of characters are they and what does that imply about our relationships with our machines? e.g. why did you imagine the compiler as sad and not full of rage?

  • edited January 2020

    I remember my brain getting tickled in all kinds of funny ways on an elective, engineering-based web development course I took years ago. We were gradually building our own MVC framework, and the lecturer, I think out of habit, of not calling programs "programs", but "extensions". Extensions of the framework.

    All programs are extensions of what is already there. This perspective shifts the unit of analysis; it is not that @barry.rountree 's nothing.py was an "empty program", but instead it was an empty extension of what was already there. A program is "a program" (an object, a Thing) in only some sense; in another sense a program, empty or not, is input to a function call.

    Nice job :)

  • In line with studying yes and true, I love it how /dev/null is not empty, but rather a whole lot of container technology for nothing.

  • edited January 2020

    @jeremydouglass between Python2 and Python3 I think there were some performance-motivated changes to how the interpreter searches for modules which likely reduced the number of system calls, but this lwn.net article suggests startup times actually double between 2.7 and 3.7.

    These long startup times form a bit of a fitness-for-purpose paradox for Python. The main places they are visible to me (outside the supercomputing world -- where perhaps we get to put a little bit of the blame on NFS :smile:) is either when we use Python as a scripting language in places we would previously have used shell scripts or Perl, and then call these Python scripts repeatedly in yet another script which then seems to be mysteriously slow; or when we use awesome easy to use Python GUI modules (or perhaps @Mace.Ojala's extensions? frameworks? libraries?) to quickly make some nifty front-end that then takes forever to start-up anytime the user launches it.

    Yet these two scenarios encompass much of what I (as a Python producer and consumer) love it for. I like writing shell scripts but with more familiar control flow and less fiddly comparison operators and dictionaries and almost-as-good-as-Perl regular expression group matches. I love cross-platform GUIs with minimal effort and no thinking about manual memory management when I mostly just want pretty widgets to show up on the screen. But every time I write Python in either scenario I'm painfully aware of the tradeoff I'm making that partially undermines my very purpose.

    I guess I was discussing efficiency a bit in the interview thread as well, and I'm sure that there is a lot that can/has been said about this obsession with respect to cultural and economic ideals of productivity, but I think the first example in particular gets at a fundamental issue with computer systems -- leaky abstractions. A single individual (or a class of those-who-know) can affect exponentially more people/nodes by hiding away something in a "punctualized node". I guess this also has resonances with @nickm's mention of the IOCCC.

    @barry.rountree do you think the art of "when to depunctualize" is simply the art of debugging, i.e. build a system and when it exhibits pathological behavior figure out why? Or is there a theoretical framework for where to be suspicious when using/building/extending a system?

  • @James.Larkby-Lahet -- very interesting point, even to remember that it is of course possible for events in a form of logged activity to be cut in half while simultaneously the total time doubled.

    @Mace.Ojala said:
    In line with studying yes and true, I love it how /dev/null is not empty, but rather a whole lot of container technology for nothing.

    This example of how accounting for absence can be conceptually or technically complex rather than simple puts me in mind of the mathematical history of the emergence of zero as a placeholder and later as numerical -- it is an advanced concept that is developed slowly (for example, Brahmagupta lays out rules for for using zero, but one of them is a/0 = 0, which makes little sense to our contemporary understanding of division and fractions).

    In computer culture, it also recalls for me contemporary discussions around null references as a programming language feature -- and the popular framing of null references by their inventory Tony Hoare, as a "billion-dollar mistake."

    Earlier @jang said:
    There's no way - other than careful code review - to check the applicability of human-readable language in comments.

    It is true that the relationship between comments and code is often nuanced and can only be understood by code review. In addition, there is research in comment processing -- both designing commenting conventions for parsing and through examining free-form comments using data mining and analytics tools.

    Some language comment systems use conventions -- like JavaDoc -- that can enable you to automatically check if many aspects of arguments, return types, exceptions etc. as described in the comments / documentation to or do not line up with the code as written. Other comment tools use data mining and heuristics or machine learning to try to identify potential problems in large code bases by looking for comment-code mismatches or red flags of various kinds. There is also some work on automatically detecting whether code comments are "explanatory" or not -- not are they correct, but are they the kinds of statements that are positively correlated with program comprehension. Flagging a comment as potentially out of date or misleading may also be done by checking version control histories and detecting when code changes, but accompanying comments do not. It is an interesting problem in the same general space as linting, pre-commit, and code coverage tools of various kinds.

    Here are some selected examples from 1990, 2000, and 2018. Back in 2010 I did a brief lit. review of the history of "comments considered harmful" vs. "literature programming" as part of a talk on code comments (see around 12m40s)

  • This program is like the computer version of Cage's 4 33. It is also sort of like x ray vision into world building - the world without us/but the world with out code.

  • @meredith.noelle : Great minds think alike. (See the first line below my TL;DR.)

    Once upon a workshop I was discussing the quirks of the ParaDiS dislocation simulation (think of it as a crack propagation simulation for materials science) with my advisor when another participant said "Excuse me, that's sounds fascinating. What does ParaDiS do?" To which I promptly replied: "It runs to completion." I then explained that I'm not a materials scientist, I don't know the science behind what the code does, but I'm obsessively interested in what happen to the hardware while that kind of code is running.

    I had a somewhat similar experience when my partner was trying to run several million simulations written in the R programming language. I took a look a the code, and while I have no idea what it was doing from a theoretical ecology standpoint, I was able to get the code running about 100x faster and then ported it to run on a supercomputer at LLNL (speeding it up by another factor of 100). I still don't know what that code did, but I'm now third author on a paper in the American Naturalist.

    What does this mean for how I approach CCS? I'm probably more away that there are dozens to hundreds of layers of code in between what I'm reading on the screen and what the processor finally executes; that each of those layers is the result of a set of goals, worldviews and compromises of potentially dozens of authors separated in space and time; and that each of those layers provides some abstraction of what's just below to the code running just above. Poking around in the gaps between those layered abstraction and ground truth has been endlessly fascinating from both a computer science and humanities perspective.

  • @Mace.Ojala : I really like the "extension" metaphor, and it works perfectly for interpreted programs and frameworks. I've been trying to puzzle out whether it works as well for the operating system being and extension of the firmware, a shell being an extension of the operating system, a compiled programming being the extension of the shell. In a sense, yes --- certainly. But having a separate stack and virtual memory space complicates things a bit. Definitely worth passing along to my students, though.

  • @James.Larkby-Lahet : Yes, leaky abstractions are very relevant here. I'm reluctant to say that debugging is simply the art of useful depunctualization, but I'm having a really difficult time coming up with a nontrivial example where that isn't the case. The trivial examples would be simple misreading, lack of understanding, etc. Not very interesting.

  • This thread makes me think of the underlying processes in interfaces for keyboards, mouse/trackpads, and touchscreens, especially when tracking users. And because I’m more of a poetry person, I’m using a bit of personification.

    The command line interface waits for keystrokes and accumulates them on the command line (we call it writing) until one hits enter/return to then execute the command. A blinking cursor in a text editor operates in a similar fashion, patiently waiting for the next input. With predictive keyboards the last sequence of typed letters or words become the basis of tracking and predicting the user’s intentions and next inputs.

    The pointer on a GUI responds to the input it gets from the mouse/touchscreen, executing commands when the user hits a button or some other kind of input. The pointer location is a way the computer can track the user— Philippe Bootz calls it the symbolic presence of the user in the work (of eliterature). In a way it always knows where the user is and that is information the software can use, as is used extensively in video games.

    The touchscreen interface fascinates me because I picture it as a sensitive surface that awaits contact, but is always surprised when the contact comes. It has no way of knowing where it will be touched next, except with sweeps and when it has interfaces that limit and direct touch, such as virtual keyboards.

    These different contact points are moments in which we communicate with computers, and they read us as we read and write on/with them.

  • edited January 2020

    @Leonardo.Flores : You might have noticed how much simpler captchas have gotten --- some now just have a checkbox next to the text "I am not a robot." It's trivial to write code that will automatically check that box. What the captcha are looking for now, though, is the pattern of mouse movements humans use when they do that task. Mimicking how a human moves a mouse turns out to be a lot more difficult for bots, which is why those captchas are so effective. We are indeed being read by the machine.

  • I’m looking for the like button in this platform, @barry.rountree.

  • @Leonardo.Flores Along those same lines: Comedy Written for the Macines (NYTimes). A much darker recitation of how machines read us.

Sign In or Register to comment.