RfD - Memory access =================== Change History ============== 20160809 Added another comment and corrected spellings. Added more address spaces. 20130926 Update at Hamburg meeting. Removed environmental queries. 20110925 Minor corrections; inserted some discussion comments 20110922 Wordsmithing for Vienna meeting. 20100924 Rewrite at Hamburg meeting. 20100301 Updated after discussion on groups and lists, Changed word names to those used by Mitch Bradley as they have more precedence than those originally proposed. 20100225 Corrections to section numbering, Corrections to reference implementation, Added Josh Grams unit tests, Moved L words to EXT. 20100224 Revised reference implementation 20090923 Restored B@ and B!. 20090921 Forth200x review release. 20090902 Forth200x review update. 20090829 First writing according to the Proposals Process as described in the Draft 200x Standard. Problem ======= Forth 2012 lacks a way of accessing memory elements of a fixed width or byte ordering in a portable way. This would be useful for sharing data between applications in the same or different machines. Solution ======== A new set of words is proposed, the MEMORY-ACCESS wordset, to provide access to data with of fixed width and byte ordering. This proposal assumes that the data are byte oriented. Typical use =========== CREATE DATA 1 L, 2 W, 3 B, DATA DUP L@ . 4 BYTES + DUP W@ . 2 BYTES + B@ . Remarks ======= 1. These words form the new Memory Access wordset. 2. The names follow the notation: [ -] [] where: order: BE for big endian LE for little endian size: B for 8-bit byte W for 16-bit word L for 32-bit long-word X for 64-bit extended-word action: !, @ or , with the usual meaning Systems with multiple address spaces, e.g. Harvard architectures and cross compilers often require an address-space indicator: address space: C code space R register space T target address space in cross compiler D Debug link The default byte order () is the native byte order for the implementation unless otherwise indicated. The default address-space () is the data space. 3. The term 'native order' as been chosen as 'local order' might be confused with some order related to local variables. The word 'host' is also used in the Berkeley Sockets API. 4. When operating on data larger than an address unit, memory operations shall be capable of unaligned operation, e.g. when fetching a 32-bit item from an address that is not 32-bit aligned, the operation will succeed. 5. Systems shall not implement words requiring or returning items larger than the cell size. It only causes portability issues rather than solving them. The rationale for these remarks is that data transfer standards exist that are big endian, e.g. TCP/IP, and little endian, e.g. USB. This forces us to make a clear distinction between big- endian, little-endian and native data. For cell-addressed machines that use an address unit larger than 8 bits, it is assumed that the upper part of a cell is simply ignored. This proposal makes no attempt to deal with packing of bytes for memory efficiency. Providing that other operations such as TYPE produce the expected result, the implementation may deal with packing of data as it sees fit. However, the model of one byte per cell will always work with least implementation complexity. Proposal ======== 18. The optional Memory Access word set [LW: If we intend this to be a recommendation rather than a candidate for inclusion in the Standard itself, this presumed insertion in ch. 18 is incorrect and these items should be renumbered.] 18.1 Introduction ----------------- All memory access operations in this wordset are defined on one or more successive address-aligned bytes. In MEMORY-ACCESS, wherever it say "store" and "fetch", the following applies: (1) It is assumed that bytes do not require address alignment. (2) For address units larger than 8 bits, each address unit contains one byte stored in the eight least significant bits. Additional bits shall be set to zero by the fetch operations. (3) Fetching an item larger than the cell size results in truncation, higher order bits are undefined. This wordset shall not be implemented on systems with address units less than 8 bits. The words in this wordlist generally take the form: [ -] [] where: order: BE for big endian LE for little endian size: B for 8-bit byte W for 16-bit word L for 32-bit long-word X for 64-bit extended-word action: !, @ or , with the usual meaning Systems with multiple address spaces, e.g. Harvard architectures and cross compilers often require an address space indicator: address space: C code space D Data space I Input/Output space, e.g. x86 R register space T target address space in cross compiler L Debug link [SFP: I know that Data space is default, but it is often stylistically better to have both code and data space markers. I suggest that we reserve D, and use 'L' for target and JTAG Links.] [LW: As you may recall, I don't care for 'J' for the debug link so I propose 'D' for "debug." We need to consider combinations of these, e.g., target code space, target data space, debug link code space, etc. Should there be a long list of address-space identifiers, or do we concatentate these? If the latter, the order should be specified. Lexicographic is probably easiest, even when the result reads funny.] If the prefix is absent, the native byte order for the implementation is assumed. The default is data space. 18.2 Additional terms --------------------- big endian: The most significant byte of a multi-byte value is stored at the lowest memory address. Also known as network order. little endian: The least significant byte of a multi-byte value is stored at the lowest memory address. native order: The byte ordering for multi-byte values used by @ and !. (See big endian and little endian.) 18.3 Additional usage requirements ---------------------------------- 18.3.1 Data types ----------------- Append table 18.1 to table 3.1. Table 18.1: Data Types Symbol Data type Size on stack w-addr 16-bit-aligned address 1 cell l-addr 32-bit-aligned address 1 cell 18.3.1.1 Addresses ------------------ Adding the size of a 16-bit word to a 16-bit-aligned address shall produce a 16-bit-aligned address. Adding the size of a 32-bit long-word to a 32-bit-aligned address shall produce a 32-bit-aligned address. 18.4 Additional documentation requirements ------------------------------------------ None 18.5 Compliance and labeling ---------------------------- 18.5.1 Forth 2012 systems ------------------------ The phrase "Providing the Memory Access word set" shall be appended to the label of any Standard System that provides all of the Memory Access word set. The phrase "Providing name(s) from the Memory Access Extensions word set" shall be appended to the label of any Standard System that provides portions of the Memory Access Extensions word set. The phrase "Providing the Memory Access Extensions word set" shall be appended to the label of any Standard System that provides all of the Memory Access and Memory Access Extensions word sets. 18.5.2 Forth 2012 programs ------------------------- The phrase "Requiring the Memory Access word set" shall be appended to the label of Standard Programs that require the system to provide the Memory Access word set. The phrase "Requiring name(s) from the Memory Access Extensions word set" shall be appended to the label of Standard Programs that require the system to provide portions of the Memory Access Extensions word set. The phrase "Requiring the Facility Extensions word set" shall be appended to the label of Standard Programs that require the system to provide all of the Memory Access and Memory Access Extensions word sets. 18.6 Glossary ------------- 18.6.1 Memory Access words -------------------------- 18.6.1.aaaa B! "b-store" MEMORY-ACCESS ( x addr -- ) Store the 8 least significant bits of x at addr. 18.6.1.aaab B@ "b-fetch" MEMORY-ACCESS ( addr -- x ) Fetch the 8 least significant bits of x from addr. 18.6.1.aaac BE-W! "b-e-w-store" MEMORY-ACCESS ( x addr -- ) Store the 16 least significant bits of x at addr in big-endian format. addr does not need to be aligned. 18.6.1.aaad BE-W, "b-e-w-comma" MEMORY-ACCESS ( x -- ) Reserve 16 bits of data space and store the 16 least significant bits of x in them in big-endian format. The data-space pointer does not need to be aligned. 18.6.1.aaae BE-W@ "b-e-w-fetch" MEMORY-ACCESS ( addr -- x ) Fetch the 16 least significant bits of x from addr in big-endian format. addr does not need to be aligned. 18.6.1.aaaf BFIELD: "b-field-colon" MEMORY-ACCESS ( -- ) The semantics of BFIELD: are identical to the execution semantics of the phrase: 1 BYTES +FIELD See: 10.6.2.---- +FIELD, 10.6.2.---- BEGIN-SDTRUCTURE, 10.6.2.---- END-STRUCTURE and 18.6.1.aaaf BYTES. [delete] 18.6.1.aaaf BYTES "bytes" MEMORY-ACCESS ( n1 -- n2 ) N2 is the size in address units of n1 8-bit units. [/delete] 18.6.1.aaag LE-W! "l-e-w-store" MEMORY-ACCESS ( x addr -- ) Store the 16 least significant bits of x at addr in litle-endian format. addr is does not need to be aligned. 18.6.1.aaah LE-W, "l-e-w-comma" MEMORY-ACCESS ( x -- ) Reserve 16 bits of data space and store the 16 least significant bits of x in them in little-endian format. The data-space pointer does not need to be aligned. 18.6.1.aaai LE-W@ "l-e-w-fetch" MEMORY-ACCESS ( x addr -- ) Fetch the 16 least significant bits of x from addr in little-endian format. addr does not need to be aligned. 18.6.1.aaaj W! "w-store" MEMORY-ACCESS ( x w-addr -- ) Store the 16 least significant bits of x at w-addr in native order. 18.6.1.aaak W, "w-comma" MEMORY-ACCESS ( x -- ) Reserve 16 bits of data space and store the 16 least significant bits of x in them in native order. The data-space pointer must be word aligned. 18.6.1.aaal W@ "w-fetch" MEMORY-ACCESS ( w-addr -- x ) Fetch the 16 least significant bits of x from w-addr in native order. 18.6.1.aaam WALIGN "w-align" MEMORY-ACCESS ( -- ) If the data-space pointer is not 16-bit aligned, reserve enough space to align it. See: 3.3.3 Data space, 3.3.3.1 Address alignment. 18.6.1.aaan WALIGNED "w-aligned" MEMORY-ACCESS ( addr -- w-addr ) w-addr is the first 16-bit aligned address greater than or equal to addr. See: 3.3.3.1 Address alignment, 18.6.1.aaam WALIGN. 18,6,1,aaao WFIELD: "w-field-colon" MEMROY-ACCESS ( -- ) The semantics of WFIELD: are identical to the execution semantics of the phrase: WALIGN 2 BYTES +FIELD See: 10.6.2.---- +FIELD, 10.6.2.---- BEGIN-SDTRUCTURE, 10.6.2.---- END-STRUCTURE, 18.6.1.aaaf BYTES and 18.6.1.aaam WALIGN. 18.6.2 Memory Access extension words ------------------------------------ 18.6.2.aaaa BE-L! "b-e-l-store" MEMORY-ACCESS EXT ( x addr -- ) Store the 32 least significant bits of x at addr in big-endian format. addr does not need to be aligned. 18.6.2.aaab BE-L, "b-e-l-comma" MEMORY-ACCESS EXT ( x -- ) Reserve 32 bits of data space and store the 32 least significant bits of x in them in big-endian format. The data-space pointer does not need to be aligned. 18.6.2.aaac BE-L@ "b-e-l-fetch" MEMORY-ACCESS EXT ( addr -- x ) Fetch the 32 least significant bits of x from addr in big-endian format. addr does not need to be aligned. 18.6.2.aaad L! "l-store" MEMORY-ACCESS EXT ( x l-addr -- ) Store the 32 least significant bits of x at l-addr in native order. 18.6.2.aaae L, "l-comma" MEMORY-ACCESS EXT ( x -- ) Reserve 32 bits of data space and store the 32 least significant bits of x in them in native order. The data-space pointer must be 32-bit aligned. 18.6.2.aaaf L@ "l-fetch" MEMORY-ACCESS EXT ( l-addr -- x ) Fetch the 32 least significant bits of x from addr in native order. 18.6.2.aaag LALIGN "l-align" MEMORY-ACCESS EXT ( -- ) If the data-space pointer is not 32-bit aligned, reserve enough space to align it. See: 3.3.3 Data space, 3.3.3.1 Address alignment. 18.6.2.aaah LALIGNED "l-aligned" MEMORY-ACCESS EXT ( addr -- l-addr ) l-addr is the first 32-bit aligned address greater than or equal to addr. See: 3.3.3.1 Address alignment, 18.6.2.aaag LALIGN. 18.6.2.aaai LE-L! "l-e-l-store" MEMORY-ACCESS EXT ( x addr -- ) Store the 32 least significant bits of x at addr in little-endian format. addr does not need to be aligned. 18.6.2.aaaj LE-L, "l-e-l-comma" MEMORY-ACCESS EXT ( x -- ) Reserve 32 bits of data space and store the 32 least significant bits of x in them in little-endian. The data-space pointer does not need to be aligned. 18.6.2.aaak LE-L@ "l-e-l-fetch" MEMORY-ACCESS EXT ( addr -- x ) Fetch the 32 least significant bits of x at addr in little-endian format. addr does not need to be aligned. 18,6,2,aaal LFIELD: "l-field-colon" MEMROY-ACCESS EXT ( -- ) The semantics of LFIELD: are identical to the execution semantics of the phrase: LALIGN 4 BYTES +FIELD See: 10.6.2.---- +FIELD, 10.6.2.---- BEGIN-SDTRUCTURE, 10.6.2.---- END-STRUCTURE, 18.6.1.aaaf BYTES and 18.6.2.aaag LALIGN. A.18 The optional Memory Access word set ======================================== Forth program frequently has to transfer data over protocols that define the byte order of the data being transferred. Data transfer standards exist that are big endian, e.g. TCP/IP, and little endian, e.g. USB. This forces us to make a clear distinction between big- endian, little-endian and native data. The term 'native order' as been chosen as 'local order' might be confused with some order related to local variables. The word 'host' is also used in the Berkeley Sockets API. When operating on data larger than an address unit, memory operations shall be capable of unaligned operation, e.g. when fetching a 32-bit item from an address that is not 32-bit aligned, the operation will succeed. Systems shall not implement words requiring or returning items larger than the cell size. For example, L@ would be required to return two 16-bit values on 16-bit systems, while on a 32-bit system it would be easier it returned a single 32-bit value. In order to avoid confusion the 32- and 64- bit words are defined in the extended words section so that only those systems 32- or 64-bit cell need provide them. For cell-addressed machines, which use an address unit larger than 8 bits, it is assumed that the upper part of a cell is simply ignored. There is no attempt to deal with packing of bytes for memory efficiency. Providing that other operations such as TYPE produce the expected result, the implementation may deal with packing of data as it sees fit. However, the model of one byte per cell will always work with least implementation complexity. Should a program ever need to detect the native order of the system, it can do so by using the following code: $1234 PAD ! PAD B@ $34 = This is true when the programming is running on a little-endian system and false otherwise. BFIELD: WFIELD: and LFIELD: are align the data pointer to the host systems prefered alignment. +FIELD can be used to dined unalinged data fields. Reference Implementation ======================== This implementation makes three assumptions: (1) It is working on a byte-addressed system. (2) A character is stored as a byte. (3) A stack cell is a minimum of 32-bits wide. \ Assuming a Char is 8-bits - Not valid on some systems : B! ( x addr -- ) SWAP $FF AND SWAP C! ; : B@ ( addr -- x ) C@ $FF AND ; SYNONYM BYTES CHARS ( n1 -- n2 ) \ Internal helper words (not part of the proposal) : b@+ ( x1 addr1 -- x2 addr2 ) SWAP 8 LSHIFT OVER B@ + SWAP 1 BYTES + ; : b@- ( x1 addr1 -- x2 addr2 ) 1 BYTES - DUP B@ ROT 8 LSHIFT + SWAP ; : b!+ ( x1 addr1 -- x2 addr2 ) 2DUP B! 1 BYTES + SWAP 8 RSHIFT SWAP ; : b!- ( x1 addr1 -- x2 addr2 ) 1 BYTES - 2DUP B! SWAP 8 RSHIFT SWAP ; \ Big-endian Memory Access Words : BE-W@ ( addr -- x ) 0 SWAP b@+ b@+ DROP ; : BE-L@ ( addr -- x ) 0 SWAP b@+ b@+ b@+ b@+ DROP ; : BE-W! ( x addr -- ) 2 BYTES + b!- b!- 2DROP ; : BE-L! ( x addr -- ) 4 BYTES + b!- b!- b!- b!- 2DROP ; : BE-W, ( x -- ) HERE 2 BYTES ALLOT BE-W! ; : BE-L, ( x -- ) HERE 4 BYTES ALLOT BE-L! ; \ Little-endian Memory Access Words : LE-W@ ( addr -- x ) 0 SWAP 2 BYTES + b@- b@- DROP ; : LE-L@ ( addr -- x ) 0 SWAP 4 BYTES + b@- b@- b@- b@- DROP ; : LE-W! ( x addr -- ) b!+ b!+ 2DROP ; : LE-L! ( x addr -- ) b!+ b!+ b!+ b!+ 2DROP ; : LE-W, ( x -- ) HERE 2 BYTES ALLOT LE-W! ; : LE-L, ( x -- ) HERE 4 BYTES ALLOT LE-L! ; \ Native Memory Access Words $1234 PAD ! PAD B@ $34 = [IF] \ Little Endian System SYNONYM W@ LE-W@ SYNONYM L@ LE-L@ SYNONYM W! LE-W! SYNONYM L! LE-W! SYNONYM W, LE-W, SYNONYM L, LE-L, [ELSE] \ Big Endian System SYNONYM W@ BE-W@ SYNONYM L@ BE-L@ SYNONYM W! BE-W! SYNONYM L! BE-W! SYNONYM W, BE-W, SYNONYM L, BE-L, [THEN] \ Alignment Words : WALIGN ( -- ) HERE 1 AND ALLOT ; : LALIGN ( -- ) 4 HERE 3 AND - 3 AND ALLOT ; : WALIGNED ( addr1 -- addr2 ) 1 + [ 1 INVERT ] LITERAL AND ; : LALIGNED ( addr1 -- addr2 ) 3 + [ 3 INVERT ] LITERAL AND ; Testing ======= HEX T{ 12345678 PAD BE-L! -> }T T{ 0 PAD 4 BYTES + L! -> }T \ BE-L@ T{ PAD BE-L@ -> 12345678 }T T{ PAD 1 BYTES + BE-L@ -> 34567800 }T \ BE-W@ T{ PAD BE-W@ -> 1234 }T T{ PAD 1 BYTES + BE-W@ -> 3456 }T T{ PAD 2 BYTES + BE-W@ -> 5678 }T T{ PAD 3 BYTES + BE-W@ -> 7800 }T \ B@ T{ PAD B@ -> 12 }T T{ PAD 1 BYTES + B@ -> 34 }T T{ PAD 2 BYTES + B@ -> 56 }T T{ PAD 3 BYTES + B@ -> 78 }T \ LE-L@ T{ PAD LE-L@ -> 78563412 }T T{ PAD 1 BYTES + LE-L@ -> 00785634 }T \ LE-W@ T{ PAD LE-W@ -> 3412 }T T{ PAD 1 BYTES + LE-W@ -> 5634 }T T{ PAD 2 BYTES + LE-W@ -> 7856 }T T{ PAD 3 BYTES + LE-W@ -> 0078 }T \ B! T{ 0 PAD BE-L! FFFFFFFF PAD 1 BYTES + B! -> }T T{ PAD BE-L@ -> 00FF0000 }T \ BE-W! T{ 0 PAD BE-L! FFFFFFFF PAD 1 BYTES + BE-W! -> }T T{ PAD BE-L@ -> 00FFFF00 }T T{ 12345678 PAD 1 BYTES + BE-W! -> }T T{ PAD BE-L@ -> 00567800 }T \ BE-L! T{ 0 PAD BE-L!BE FFFFFFFF PAD 1 BYTES + BE-L! -> }T T{ PAD BE-L@ PAD 4 BYTES + BE-L@ -> 00FFFFFF FF000000 }T T{ 12345678 PAD 1 BYTES + BE-L! -> }T T{ PAD BE-L@ PAD 4 BYTES + BE-L@ -> 00123456 78000000 }T \ B! T{ 0 PAD LE-L! FFFFFFFF PAD 1 BYTES + B! -> }T T{ PAD LE-L@ -> 0000FF00 }T \ LE-W! T{ 0 PAD LE-L! FFFFFFFF PAD 1 BYTES + LE-W! -> }T T{ PAD LE-L@ -> 00FFFF00 }T T{ 12345678 PAD 1 BYTES + LE-W! -> }T T{ PAD LE-L@ -> 00567800 }T \ LE-L! T{ 0 PAD LE-L! FFFFFFFF PAD 1 BYTES + LE-L! -> }T T{ PAD LE-L@ PAD 4 BYTES + LE-L@ -> FFFFFF00 000000FF }T T{ 12345678 PAD 1 BYTES + LE-L! -> }T T{ PAD LE-L@ PAD 4 BYTES + LE-L@ -> 34567800 00000012 }T \ WALIGNED T{ 0 BYTES WALIGNED -> 0 BYTES }T T{ 1 BYTES WALIGNED -> 2 BYTES }T T{ 2 BYTES WALIGNED -> 2 BYTES }T T{ 3 BYTES WALIGNED -> 4 BYTES }T \ LALIGNED T{ 0 BYTES LALIGNED -> 0 BYTES }T T{ 1 BYTES LALIGNED -> 4 BYTES }T T{ 2 BYTES LALIGNED -> 4 BYTES }T T{ 3 BYTES LALIGNED -> 4 BYTES }T T{ 4 BYTES LALIGNED -> 4 BYTES }T T{ 5 BYTES LALIGNED -> 8 BYTES }T \ ToDo: BYTES \ ToDo: BE-W, LE-W, W! W, W@ WALIGN \ ToDo: BE-L, LE-L, L! L, L@ LALIGN \ ToDo: BFIELD: WFIELD: LFIELD: Authors ======= Federico de Ceballos Universidad de Cantabria federico.ceballos@unican.es Stephen Pelc MicroProcessor Engineering Ltd stephen@mpeforth.com Peter Knaggs University of Exeter pjk@bcs.org.uk Leon Wagner FORTH, Inc. leon@forth.com