\ INTERNATIONAL.FTH - ANS draft internationalisation (( Copyright (c) 2001 MicroProcessor Engineering 133 Hill Lane Southampton SO15 5AF England tel: +44 (0)23 8063 1441 fax: +44 (0)23 8033 9691 net: mpe@mpeforth.com tech-support@mpeforth.com web: www.mpeforth.com From North America, our telephone and fax numbers are: 011 44 23 8063 1441 011 44 23 8033 9691 PLEASE NOTE THAT OUR PHONE NUMBER CHANGED IN APRIL 2000 You are free to use this code in any way, as long as the MPE copyright notice in this section is retained. This code is an implementation of the draft ANS internationalisation specification available from the download area of the MPE web site. The implementation provides more functionality than is required by the ANS draft standard and provides enough hooks to be the basis of a practical system. To do ===== Change history ============== 20031210 SFP003 Updated for VFX Forth 3.60 20010425 SFP002 Added GET-ESCAPE 20010326 SFP001 First release )) decimal \ ======================================================================= ( *! i18n ) ( ** ) ( *T Internationalisation ) ( ** ) ( *P Internationalisation often requires support for strings longer ) ( ** than the 255 characters supported by counted strings in the 8 bit ) ( ** character set used by VFX Forth during application development. ) ( ** Such strings may also not be in the character set or size used ) ( ** by the application developer. ) ( *P Internationalisation often requires third parties to be able to ) ( ** convert text strings without having to recompile the application. ) ( *P Forth system developers and vendors need to make their systems ) ( ** compatible with their clients existing approaches to ) ( ** internationalisation. ) ( *P This implementation supports all these requirements, and is ) ( ** a compatible superset of the current ANS Forth Internationalisation) ( ** proposals, which are ) ( ** available from the downloads section of the MPE web site at: ) ( ** http://www.mpeforth.com ) ( *P If you are using this software with MPE's VFX Forth system, the ) ( ** source code is in the file LIB\INTERNATIONAL.FTH. ) ( *P MPE acknowledges the help and support of Construction Computer ) ( ** Software, Cape Town, South Africa, in the design of this software. ) ( ** The CCS application has been internationalised for many years, and ) ( ** their experience has been invaluable, both in defining the draft ) ( ** ANS standard and in developing this code. ) \ ======================================================================= \ ******************************** ( *S Long string parsing support ) \ ******************************** [undefined] parse/l [if] variable ^StringBuff \ -- addr ; pointer to buffer on heap variable #StringBuff \ -- addr ; length used in buffer : FreeStringBuff \ -- ; release string buffer ^StringBuff @ ?dup if free drop ^StringBuff off #StringBuff off endif ; : AllocStringBuff \ -- ; ensure buffer is allocated FreeStringBuff #32768 allocate abort" Can't allocate string buffer" ^StringBuff ! #StringBuff off ; : parse-more \ char -- flag c-addr len ; flag is true if more needed >r \ character to scan for source >in @ /string \ skip start of buffer ; -- addr(s) len(s) over swap r> scan >r \ scan for terminator ; -- addr(s) addr(e) over - \ calculate string ; -- addr(s) len(s) dup r@ 0<> - >in +! \ calculate length to update >IN r> 0= -rot \ return flag ; : parse/l \ char -- c-addr len ; like PARSE over lines ( *G Parse the next token from the terminal input buffer using as) ( ** the delimiter. The text up to the delimiter is returned as a ) ( ** c-addr u string. PARSE/L does not skip leading delimiters. In ) ( ** order to support long strings, PARSE/L can operate over multiple ) ( ** lines of input and line terminators are not included in the text. ) ( ** The string returned by PARSE/L remains in a single global buffer ) ( ** until the next invocation of PARSE/L. PARSE/L is designed for use ) ( ** at compile time and is not thread-safe or winproc-safe. ) AllocStringBuff begin \ -- char dup parse-more \ -- char flag c-addr len tuck ^StringBuff @ #StringBuff @ + swap move #StringBuff +! while refill 0= abort" Missing closing double quotes" repeat drop ^StringBuff @ #StringBuff @ ; [then] \ ******************** ( *S Data structures ) \ ******************** ( *N Rationale ) ( *P Although internationalised strings may be referenced by the ) ( ** addresses of suitable data structures, these addresses will change ) ( ** from build to build of the application. The implementation here ) ( ** permits strings to be given a number which does not change between ) ( ** builds. Together with a compile-time hook which can generate a ) ( ** text file in the development language, application strings can be ) ( ** translated in external text files without rebuilding the ) ( ** application. This is required in situations in which translation ) ( ** is performed locally by dealers or by users themselves. ) ( *P The /TEXTDEF structure described below permits messages to be ) ( ** accessed either by message number or by the address of the ) ( ** structure. ) ( *N /TEXTDEF structure ) ( *P Internationalisation of messages relies on a data structure ) ( ** /TEXTDEF. The /TEXTDEF structure ) ( ** contains a link to the previous TEXTDEF or #TEXTDEF definition, a ) ( ** message identifier which is 0 for non-databased strings in the ISO ) ( ** Latin1 coding, the address of the text, and the length of the ) ( ** text in bytes. The text is followed by two zero bytes, and the ) ( ** text is long aligned. The /TEXTDEF structure is a superset of the ) ( ** /ERRDEF structure used for error messages by VFX Forth. ) ( *P The words #TEXTDEF and ERR$ are DEFERred. #TEXTDEF is used by ) ( ** TEXTDEF. The user can install alternative versions of these words ) ( ** for internationalised applications. In this context, #TEXTDEF and ) ( ** friends can be used as the basis of any text handler that requires ) ( ** translation. Note that #TEXTDEF can be modified so that a message ) ( ** file is produced at compile time, and ERR$ modified so that the ) ( ** message file is accessed at run time. Similarly, providing that ) ( ** the application language is correctly handled, the run time can ) ( ** access translated messages in other languages, character sets and ) ( ** character sizes. ) ( *P The messages are linked into the same chain as is used for ) ( ** all error strings that can be internationalised. This chain ) ( ** is anchored by the variable TEXTCHAIN. ) ( *E struct /textdef \ -- len ; DOES NOT include constant definition ) ( ** int td.value \ value that identifies string ( ** int td.link \ link to previous TEXTDEF td.link field) ( ** int td.id \ 0 or message ID ) ( ** int td.caddr \ address of text string ) ( ** int td.len \ length of text string ) ( ** int td.lenInline \ length of inline text string in bytes ) ( ** end-struct ) ( *N String structure ) ( ** Multiline text is laid down as an inline string whose first byte ) ( ** is aligned on a four-byte boundary with two zero bytes as ) ( ** termination. The end of the string is also aligned on a ) ( ** four-byte boundary. ) struct /textdef \ -- len ; i18n text structure int td.value \ value that identifies string int td.link \ link to previous TEXTDEF td.link field int td.id \ 0 or message ID int td.caddr \ address of text string int td.len \ length of text string in bytes int td.lenInline \ length of inline text string in bytes end-struct \ ******************************************** ( *S Creating and referencing LOCALE strings ) \ ******************************************** ( *P In this implementation, the ANS locale string identifier "lsid" ) ( ** is a pointer to a /TEXTDEF structure. ) defer l$CompileHook \ ^textdef -- ( *G A DEFERred hook that the user can modify to produce additional ) ( ** data at compile time. For example, the hook is commonly replaced ) ( ** by code that generates a text file in the development language. ) ( ** This text file then serves as the basis for translation to other ) ( ** languages. ) ' drop is l$CompileHook : L$", \ n -- ; compile a long string ( *G This can be thought of as a multiline version of ",. ) ( ** First a /TEXTDEF structure is created. Then it collects multiline ) ( ** text and lays down an inline string with two ) ( ** zero bytes as termination. The start of the string is aligned on a ) ( ** four-byte boundary. The end of the string is padded to a four-byte ) ( ** boundary. ) align here >r /textdef allot&erase \ reserve TEXTDEF structure r@ td.value ! \ set value field TextChain @ r@ td.link ! \ update chain r@ td.link TextChain ! \ keep /ERRDEF compatible align [char] " parse/l \ -- addr len dup r@ td.lenInline ! \ set inline string length in TEXTDEF dup here r@ td.caddr 2! \ set here/len to TEXTDEF tuck here swap movex \ copy string allot 0 w, align \ reserve space for string r> l$CompileHook [defined] VFXForth [if] \ VFX Forth specific state @ if discard-sinline endif [then] ; : (#TextDef) \ n -- ; -- n ; used as message codes create l$", \ lay TEXTDEF and string does> td.value @ ; defer #TEXTDef \ n -- ; -- n ( *G Define a constant and associated message in the form: ) ( ** #TEXTDEF "". Execution of returns . ) ' (#TEXTDEF) is #TEXTDEF : NextText \ -- addr ( *G Returns the address of the variable holding the next constant used ) ( ** to identify an internationalised string. ) NextError ; : NextText# \ -- n ( *G Return the contents of NEXTTEXT and increment NEXTTEXT. ) NextText @ NextText Incr ; : TextDef \ -- ; -- n ; used as throw/error codes ( *G Define a constant and associated message in the form: ) ( ** TEXTDEF "". Execution of returns the constant ) ( ** automatically allocated by NEXTTEXT#. ) NextText# #TextDef \ constant and update ; : l$find \ n -- struct|0 ; produce pointer to TEXTDEF structure ( *G Given a message number n, return the address of the /TEXTDEF ) ( ** structure containing its details. ) TextChain @ begin dup while 2dup cell- @ = if nip cell - exit endif ed.link @ repeat 2drop 0 ; : l$count \ lsid -- c-addr u ( *G Given a /TEXTDEF structure, the address and length in bytes of the ) ( ** text string are returned. ) dup td.caddr @ swap td.len @ ; : l$addr \ lsid -- c-addr ( *G Given a /TEXTDEF structure, the address of the text string is ) ( ** returned. ) td.caddr @ ; [defined] VFXForth [if] -sin [then] : (l$") \ -- lsid ( *G The runtime action of L$" to return the address of the /TEXTDEF ) ( ** structure associated with the string compiled by L$". ) r> aligned \ -- ^textdef dup dup td.lenInline @ 2+ + \ add length of inline string /Textdef + \ add length of /TEXTDEF structure aligned >r ; : L$" \ -- ; -- lsid ( *G Used inside a colon definition to compile a string that will be ) ( ** internationalised. At run time the address of the TEXTDEF ) ( ** structure will be returned. ) ?comp postpone (l$") NextText# l$", ; immediate : LS" \ -- ; -- caddr u ( *G Used to compile or extract a long string. When used during ) ( ** compilation L$", is used to a lay down a string for internationalisation. ) ( ** At run time the address and length of the string are returned. ) state @ if postpone (l$") NextText# l$", postpone l$count else [char] " PARSE/L endif ; IMMEDIATE : ZLS" \ -- ; -- c-addr ( *G Used to compile or extract a zero terminated long string. When used during ) ( ** compilation L$", is used a lay a string for internationalisation. ) ( ** At run time the address of the string is returned. ) state @ if postpone (l$") NextText# l$", postpone l$addr else [char] " PARSE/L drop endif ; IMMEDIATE [defined] VFXForth [if] +sin [then] \ ************************ ( *S ANS LOCALE word set ) \ ************************ ( *P In this implementation, the ANS locale string identifier "lsid" ) ( ** is a pointer to a /TEXTDEF structure. ) variable defer set-language \ lang -- ior ( *G Set the current language code. At the very least, the action of this word must ) ( ** be to set the variable . The action may also include ) ( ** updating the string data in the TD.CADDR and TD.LEN fields of all ) ( ** the /TEXTDEF and /ERRDEF structures. ) ( ** If the operation succeeds, the returned ior is 0. If the operation ) ( ** fails, the returned ior is non-zero and the meaning of the ior is ) ( ** implementation dependent. ) :noname ! 0 ; is set-language : get-language \ -- lang ( *G Return the current language code. ) @ ; variable defer set-country \ country -- ior ( *G Set the current country code. At the very least, the action of ) ( ** this word must be to set the variable . The action may ) ( ** also include updating locale-sensitive routines such as date and ) ( ** time display formatting words. ) ( ** If the operation succeeds, the returned ior is 0. If the operation ) ( ** fails, the returned ior is non-zero and the meaning of the ior is ) ( ** implementation dependent. ) :noname ! 0 ; is set-country : get-country \ -- country ( *G Return the current country code. ) @ ; : l" \ -- ; -- lsid ; L" " \ *G A locale-sensitive version of C" which returns an lsid (string \ ** indentifier) at run-time. The native text may be compiled inline \ *P Interpetation: \ ** The interpretation semantics for this word are undefined. \ *P Compilation: \ "ccc" -- \ ** Parse ccc delimited by a " (double-quote) and append the run-time \ ** semantics given below to the current definition. \ *P Runtime: \ -- lsid \ ** Return lsid, an identifier for a locale string. Other words use \ ** lsid to extract language specific information. postpone l$" ; immediate : LOCALE@ \ lsid -- addr len(au) \ *G Return the address and length in address units of the string (in the \ ** current language) that corresponds to the native string identified by \ ** lsid. The format of the string at addr is implementation dependent. \ ** The length of the string is returned in address units so that it may \ ** be copied by MOVE without knowledge of the character set width. l$count ; defer SUBSTITUTE \ i*x addr1 len1 addr2 len2 - j*y addr2 len3 \ *G Perform macro substitution on the lstring at addr1/len1 placing the \ ** result at lstring addr2/len2, returning addr2 and len3, the length \ ** of the resulting string. Ambiguous conditions occur if the resulting \ ** string will not fit into addr2/len2, or macro text cannot be found, \ ** or if the lstring at addr2/len2 overlaps the lstring at addr1/len1. \ ** Macros may take parameters from the Forth data stack. \ *P When a macro name delimited by escape characters (see SET-ESCAPE) \ ** is encountered by SUBSTITUTE, the following action occurs: \ *D 1) If the name is a valid macro name, a locale and implementation \ ** dependent action occurs \ *D 2) If the name is null, a single escape character is substituted \ *D 3) In all other cases an ambiguous condition exists. [defined] VFXForth [if] \ MPE VFX Forth :noname drop Expandmacro ; is substitute ' substitutions voc>wid constant wid-substitutions [else] #256 buffer: name[ \ -- addr ; scratch buffer to hold macro name [undefined] place [if] : PLACE \ c-addr1 u c-addr2 -- \ Copy the string described by c-addr1 u as a counted string at \ the memory address described by c-addr2. 2dup 2>r 1 chars + swap move 2r> c! ; [then] [undefined] textmac-delimiter [if] char % value textmac-delimiter [then] vocabulary substitutions also substitutions get-order over constant wid-substitutions set-order previous : (substitute) \ srce slen dest dlen -- dest dlen' \ *G Expand source using macros. Note that this version is simplistic, \ ** performs no error checking, and requires a global buffer. 0 locals| dlen' dlen dest slen srce | dest dlen erase \ initialise destination string srce slen begin dup while over c@ textmac-delimiter <> if \ character not % over c@ dest dlen' + c! \ add character to output dlen' 1+ to dlen' 1 /string \ step on one source character else over 1 + c@ textmac-delimiter = if \ %% for one output % textmac-delimiter \ add one % to output dest dlen' + c! dlen' 1+ to dlen' 2 /string \ step over 2 source characters else 1 /string \ step over leading % 2dup textmac-delimiter scan \ get trailing % to form macro name \ -- saddr slen eaddr elen rot drop \ -- saddr eaddr elen rot 2 pick over - \ -- eaddr elen saddr nlen name[ place \ -- eaddr elen ; set into name buffer name[ count upper \ force upper case name[ count wid-substitutions search-wordlist if \ look for macro name execute count dup >r \ -- saddr len maddr mlen dest dlen' + swap move \ add string to dest r> dlen' + to dlen' \ update destination string length then over c@ textmac-delimiter = \ if we really had a trailing % if 1 /string endif \ step over it endif endif repeat 2drop dest dlen' ; ' (substitute) is substitute \ create b1 ", aaa%lib%bbb" \ 256 buffer: b2 [then] defer set-macro \ addr len c-addr u -- \ *G Define the localised string addr/len in address units as the text \ ** to substitute for the macro of the name (in the development \ ** character set) c-addr/u. If the macro does not exist it is created. :noname 2dup wid-substitutions search-wordlist if nip nip \ discard c-addr u execute \ get macro buffer address else also substitutions definitions ($create) \ like CREATE but takes caddr/len previous definitions here $0100 allot dup $100 erase \ create buffer space else then place \ copy as counted string ; is set-macro : SET-ESCAPE \ locale-char -- \ *G Set the macro escape character to be the localised character \ ** locale-char. By default it is the ASCII % character if it is \ ** available in the application character set. to textmac-delimiter ; : GET-ESCAPE \ -- locale-char \ *G Return the macro escape character locale-char. \ ** By default it is the ASCII % character if it is \ ** available in the application character set. See SET-ESCAPE. textmac-delimiter ; \ ********************************** ( *S ANS LOCALE extension word set ) \ ********************************** ( *P In this implementation, the ANS locale string identifier "lsid" ) ( ** is a pointer to a /TEXTDEF structure. ) defer LOCALE-INDEX \ lsid -- \ *G Updates the internal data structure. Useful if structures are added \ ** and changes to internal structures are required. ' drop is locale-index : LOCALE-LINK \ lsid1 -- lsid2 \ *G Given the address of one LOCALE structure, returns the address \ ** of the next. td.link @ cell - ; defer LOCALE-TYPE \ addr len -- \ *G Displays the LOCALE string whose address and length in address \ ** units are given. ' type is locale-type : NATIVE@ \ lsid -- c-addr len \ *G Given a LOCALE structure, returns the address and length of the \ ** corresponding DCS native string that was compiled by L". dup /Textdef + swap td.lenInline @ ; \ *************************** \ *S Windows language support \ *************************** \ *P Windows contains a large number of predefined language constants \ ** of the form LANG_xxx and SUBLANG_xxx. A Windows locale is identified \ ** by merging a pair of these as described below. \ *E +-------------------------+-------------------------+ \ ** | SubLanguage ID | Primary Language ID | \ ** +-------------------------+-------------------------+ \ ** 15 10 9 0 bit \ *P These constants can be viewed from VFXForth by using: \ *C SIM LANG_ \ *C SIM SUBLANG_ \ *P These codes use 0 as the current or neutral code, which matches \ ** using 0 as the language code for the development character set, \ ** which is ISO Latin 1 for VFX Forth. In this set, the seven bit \ ** ASCII character set defined by ANS Forth represents characters \ ** 0..127. : langID \ primary secondary -- langid \ *G Generate a Windows language code from the primary and secondary \ ** codes, e.g. \ *C LANG_SPANISH SUBLANG_SPANISH_MEXICAN langid 10 lshift or ; \ ********* \ test code \ ********* [defined] testccs [if] [endif]