heredocs in lisp

Shell script and Perl have heredocs. They are handy because you don't have to worry about quoting meta characters in a large body of data.

(Depending on which flavour you use, of course. You can interpolate variables if you wish.)

Extracted from Lisp newbie,

You could probably cook up a funky reader-macro to do heredocs in lisp. Probably a good first exercise in reader-macro-land.

Is it a good idea?

[before finding it can be done] Maybe the answer is "that would be pointless". Still, with perl I've grown accustomed to being able to paste a chunk of stuff into the source and then mess with it (for example, outdent it; this allows it to be indented in the source, which aids legibility). Indeed, being able to read the inline POD can be handy too.

I suspect that this is caused by "thinking in strings" instead of lists. Still, if I were writing HTML-from-sexpr source I would rapidly grow tired of "quoting strings".

Implementing it

Well since it is "probably a good first exercise", I won't ask you to show me how. Watch this space (but not too hard).

This is actually not as easy as it looks. For example:

    (list #>END 2 3
  some text
  END
   4)

should read as

(list "some text\n" 2 3 4)
. But to do that, you need to have the ordinary reader take over again between the marker and the start of the here-doc.

Lisp already has multi-line strings

Standard "string" syntax can contain literal newlines - as often used for function docstrings.

* (defvar *foo* "This is
  my sample
  multiline string!
  
  Whee.")

*FOO* * (write-line *foo*) This is my sample multiline string!

Whee. "This is my sample multiline string!

Whee." *

However this does require literal " and \ to be backslashed.

Lisp already has balanced comment syntax

The standard reader treats #| ..stuff.. |# as a comment. See http://www.lispworks.com/documentation/HyperSpec/Body/02_dhs.htm for details.

Is it Lispy to extend existing #| and ## reader notation in a form such as

(list
#1|put some text here
with whatever "quotes" you need
|1# 'and #1# "use it" #1# "often")
?

Advantage: it may be easier to teach editors what's going on, if they already understand #|...|# .

SBCL currently gives "WARNING: A numeric argument was ignored in #1|."

Motivation

[rework in progress on this unsigned text]

Also, as far as I can tell the motivation for the heredoc syntax is the ablity to do stuff like this in shell (note the redirection):

sed-or-awk < target
...script...
END

There the clarity of having the redirection on a single line is worth something, but I really wonder if this is worth anything in lisp.

I'm well aware that Lisp has multi-line strings -- my point was just that although this is labeled as a "my first readtable" exercise, it's not even clear to me that it's possible without essentially writing your own reader. And while it's nice to be able to say

(let ((files #>this #>that #>the-other)) ... )
...
this
... ...
the-other
it's probably not worth the effort.

It's not much effort, if I correctly understand what you want here. Here's a quick off-the-cuff implementation (untested...may have typos, etc.)

 (defun read-string-to (terminal stream)
  (let ((current 0))
    (with-output-to-string (out)
      (loop
	(let ((char (read-char stream t nil t)))
	  (if (char= char (char terminal current))
	      (when (= (incf current) (length terminal)) (return))
	      (progn
		(write-string terminal out :start 0 :end current)
		(if (char= char (char terminal 0))
		    (setq current 1)
		    (progn
		      (setq current 0)
		      (write-char char out))))))))))

(defun read-heredoc (stream char arg) (declare (ignore arg)) (read-string-to (read-string-to (string char) stream) stream))

(set-dispatch-macro-character #\# #\> #'read-heredoc)

Just type #>foo> ... foo

Not quite...

* #>foo>
bar
foo

" bar " * (list #>foo> 1 2 3) bar foo C-c C-c ;; (i.e. interrupt it).

I know you can make a concatenated-stream out of the rest of the line plus the stream after the end-marker, but I can't think of a way to make the reader continue reading from this new stream instead of the one it's working on (short of hacking the SBCL source, that is). Once again, it's not that important, it's just not a newbie exercise.
Huh? The above example works perfectly, you just didn't finish your list. You wrote the equivalent of
(list " 1 2 3)
bar
"
What you should have written was
(list " 1 2 3)
bar
")
Note the closing paren. In your example, that's the same as
(list #>foo> 1 2 3
bar
foo)
Heredocs are a bit stranger than that:
~/src% perl
print <

This is what makes them more than just fat quotation marks -- you can tersely include a large block of text as a string in the middle of an expression. One way to do this might be to have the heredoc dispatch macro just record the heredoc's name and an associated gensym, then install a reader macro for newline that would read in the document contents, and set the gensym's value to the text.

No you can't. You need to be able to detect the end of the current top-level form and start reading from there, not just the next newline; and that wouldn't actually work, either, because you'd need to somehow delay evaluation of the form until after the values were set (in case it was being typed at the REPL or loaded as source). I.e., it'd all be very ugly. ISTM better to use something like this as-is, with LET to bind the variables first. But maybe READ-HEREDOC should return a string-input-stream rather than a string -- in the shell, using <


Similar features elsewhere

CHICKEN ("A practical and portable Scheme system") has #<non-standard read syntax. It works as a perl or shell programmer would expect (possible omission of trailing \n?). This is not Common Lisp, I only mention it for interest.
The implementation READ-STRING-TO above is broken for cases where part of a mismatched sequence contains part of a sequence that would actually match; luckily this case is covered and described by Graham in ANSI Common Lisp.
Imagine the following situation:
(with-open-stream (stream (make-string-input-stream "foo bar ababac"))
  (read-string-to "abac" stream))

Result: END-OF-FILE gets signaled because "abab" is flushed without the second "ab" being re-used for "abac" at the end.
Inspired by Graham's suggested ring buffer solution I just wrote the following for evol:

;;;; ring-buffer.lisp
;;; Operations on ring buffers, as described by Paul Graham in ANSI Common Lisp.
(defstruct ring-buffer
  "Structure defining ring buffers utilizing a simple VECTOR of fixed size and
four indices:
START: Index of first live value
END:   Index of last live value
USED:  Beginning of current match
NEW:   End of current match"
  vector (start -1) (used -1) (new -1) (end -1))

(defun new-ring-buffer (length) "new-ring-buffer length => ring-buffer

Create a new RING-BUFFER containing a simple character vector of fixed size LENGTH." (make-ring-buffer :vector (make-array length :element-type 'character)))

(defun rbref (buffer index) "rbref buffer index => character or #\Nul

Return character stored at INDEX in ring BUFFER." (char (ring-buffer-vector buffer) (mod index (length (ring-buffer-vector buffer)))))

(defun (setf rbref) (value buffer index) "setf (rbref buffer index) value => value

SETF for RBREF. If INDEX > LENGTH of BUFFER, start over at the beginning." (setf (char (ring-buffer-vector buffer) (mod index (length (ring-buffer-vector buffer)))) value))

(defun ring-buffer-insert (buffer value) "ring-buffer-insert buffer value => value

Increment END of BUFFER inserting VALUE at the new index." (setf (rbref buffer (incf (ring-buffer-end buffer))) value))

(defun ring-buffer-reset (buffer) "ring-buffer-reset buffer => end-index

Reset match beginning/end indices USED and NEW in BUFFER to START and END." (setf (ring-buffer-used buffer) (ring-buffer-start buffer) (ring-buffer-new buffer) (ring-buffer-end buffer)))

(defun ring-buffer-pop (buffer) "ring-buffer-pop buffer => character

Increment START of BUFFER returning VALUE at the new index. Additionally, reset the BUFFER match indices." (prog1 (rbref buffer (incf (ring-buffer-start buffer))) (ring-buffer-reset buffer)))

(defun ring-buffer-next (buffer) "ring-buffer-next buffer => character or nil

Return next match character incrementing USED in BUFFER or simply NIL if none are left." (when (< (ring-buffer-used buffer) (ring-buffer-new buffer)) (rbref buffer (incf (ring-buffer-used buffer)))))

(defun ring-buffer-clear (buffer) "ring-buffer-clear buffer => -1

Reset all indices of BUFFER to their initial state." (setf (ring-buffer-start buffer) -1 (ring-buffer-used buffer) -1 (ring-buffer-new buffer) -1 (ring-buffer-end buffer) -1))

(defun ring-buffer-flush (buffer) "ring-buffer-flush buffer => string

Flush all unused characters in BUFFER." (with-output-to-string (out) (do ((index (1+ (ring-buffer-used buffer)) (1+ index))) ((> index (ring-buffer-end buffer))) (write-char (rbref buffer index) out))))

;;;; heredoc.lisp (defun read-until-match (stream terminal) "read-until-match stream terminal => string

Read characters from STREAM until a sequence equal to string TERMINAL is read. Return all characters read as string omitting TERMINAL itself. Signal error upon EOF." (with-output-to-string (out) (do* ((match-length (length terminal)) (buffer (new-ring-buffer match-length)) (buffer-char nil) (char (read-char stream t :eof t) (or (setf buffer-char (ring-buffer-next buffer)) (read-char stream t :eof t))) (match-pos 0)) ((eql char :eof)) (cond ((char= char (char terminal match-pos)) (when (= (incf match-pos) match-length) (return)) (unless buffer-char (ring-buffer-insert buffer char))) ((zerop match-pos) (write-char char out) (when buffer-char (ring-buffer-pop buffer))) (t (unless buffer-char (ring-buffer-insert buffer char)) (write-char (ring-buffer-pop buffer) out) (setf match-pos 0))))))

(defun read-heredoc (stream char arg) "read-heredoc stream char arg => string

Return string from STREAM up to the point where the string read first until CHAR is encountered. All evaluation is completely turned off so no quoting is required at all. Example: #>eof>Write whatever (you) \"want\"!eof => Write whatever (you) \"want\"!" (declare (ignore arg)) (read-until-match stream (read-until-match stream (string char))))

(set-dispatch-macro-character #\# #\> #'read-heredoc)

I haven't written any unit tests for this yet but all cases I've tried worked like a charm.


This page is linked from: Lisp newbie   read-macros  

CLiki pages can be edited by anyone at any time. Imagine a fearsomely comprehensive disclaimer of liability. Now fear, comprehensively