Tuesday 21 July 2020

(practical-python->racket 1.4 Strings)

The format of the Practical Python Programming 1.4 Strings is a little different from Section 1.3 Numbers. The exercises are designed to be entered into the Python console rather than complete, albeit short, programs. I have created a single racket program, strings.rkt, that covers both the examples and the exercises.

Python seems to have a plehtora of string literal types which are deliminated between ' ', " ", ''' ''', """ """, f' ', f" ", r' ', r" ", b' ' and b" ". In Python there is no difference between a string deliminated with ' ' or one deliminated with " ". Both are single-lines strings that expand escaped characters. ''' and """ are used for multi-line strings. The other options are r strings (raw strings), b strings (binary string) and f strings (format strings).

As far as I know racket has two string literal forms, one deliminated with " ", the other binary strings deliminated with #" ". It was a pleasant surprise to find that Racket strings are multi-line.

Translating most of the examples and exercises from Pyhton to Racket was quite straightforward. I'll only mention the few that weren't.

In Python you can specify negative indices which count back from the end of the string. It's a nice convenience but isn't that difficult to manage without by calculating the index through reference to the string length.

Python's string replication using the * operator (hello_5 = "Hello" * 5) proved to be a little more challenging. In trying to come up with a solution, I stumbled across Racket's for loop function. It was quite a suprise to find a for loop in Racket. Using the for function and set!, I was able to come up with a translation to Racket.

(define sss "Hello")
(define original-sss (string-copy sss))
(for ([i (in-range 1 5)])
     (set! sss (string-append sss original-sss)))
(displayln sss)

My Racket isn't anywhere near as concise as the Python though it could form the basis for a crude string-replicate function. Personally, I feel that such a function would be so rarely used that it wouldn't be worth writing one.

Python's String find and index methods return the index of the start of a substring found within a string. My first attempt to translate them was (index-of (string->list s) t). However this would only work to find a single character. After a litle more searching, I found that by using a regular expression I could duplicate the Python methods - (car (car (regexp-match-positions s t))). (The car-car is needed as the Racket regexp-match-positions returns a list of pairs of start and end positions.) 

The regular expression to translate the Python rfind and rindex string methods which search backwards from the end of a string is a little more complicated - (car (last (regexp-match-positions (regexp (string-append ".*(" t ").*$")) s)))). It is perhaps easier to understand if split into two expressions:

(define reggie (regexp (string-append ".*(" t ").*$")))
(car (last (regexp-match-positions t s)))

To emulate the Python isalpha String method, I needed to resort to a regular expression again as I couldn't find anything similar in the Racket standard library. At first this seemed easy (regexp-match? #px"^[a-zA-Z]+$" s) worked well against English strings. I then remembered that Python has pretty good Unicode support. I checked and confirmed that Python's isalpha recognises more than a to z as "alpha". I was struggling to find a solution in Racket so I posted a message on the Racket Mailing List. Ryan Culpepper of the Racket Team not only very kindly explained how characters are classified in Unicode, he also showed me how this information could be accessed in Racket and how there are special regular expression character classes in Racket that can be used to process the Unicode character classification. 

If that wasn't enough he also pointed out the need to normalise Unicode code points in "composed" form when using those special reqex character classes. (regexp-match? #px"^\p{L}+$" (string-normalize-nfc s))) is my resulting equivalent of Python's isalpha.

One significant difference between Python and Racket strings, which was a little surprising to me, is that whilst Python strings are immutable Racket has both immutable and mutable strings. A racket string created using "" enclosed string literals is immutable, a string created with the string function is mutable. This short Racket repl session shows the difference:

> (define immutable "Hallo")
> (string-set! immutable 1 #\e)
; string-set!: contract violation
;   expected: (and/c string? (not/c immutable?))
;   given: "Hallo"
;   argument position: 1st
; [,bt for context]
> (define mutable (string #\H #\a #\l #\l #\o))
> (string-set! mutable 1 #\e)
> mutable
"Hello"

As far as I can tell Racket doesn't have any string interpolation features like Python's f strings. It was pretty straightforward to translate the f string example but much more wordy.
  Python:
    f'{shares} shares of {name} at $(price:0.2f}'
  Racket:
     (displayln
       (string-append
         (~a shares)
         " shares of "
         name
         " at $"
         (real->decimal-string price 2)))

Both Python and Racket are considered "Batteries included" languages. I'm worried that my translations of Python to Racket might give the impression that Python has the bigger batteries. So to ensure a little better balance on that score, Racket not only has equivalents of Python's lower() and upper() string methods but also has titlecase and foldcase options. 

Finally these are very basic String examples, as to be expected in a course of the nature of Practical Python Programming. They do not cover any of the nuances of supporting Unicode. I have been led to believe that whilst Python has pretty good Unicode support it falls down a little as it doesn't handle different locales well. As I understand Racket does fully support different locales.

Next, 1.5 Lists
    

No comments: