1 /* Coding system handler (conversion, detection, and etc).
2 Copyright (C) 1995,97,1998,2002,2003 Electrotechnical Laboratory, JAPAN.
3 Licensed to the Free Software Foundation.
4 Copyright (C) 2001,2002,2003 Free Software Foundation, Inc.
6 This file is part of GNU Emacs.
8 GNU Emacs is free software; you can redistribute it and/or modify
9 it under the terms of the GNU General Public License as published by
10 the Free Software Foundation; either version 2, or (at your option)
13 GNU Emacs is distributed in the hope that it will be useful,
14 but WITHOUT ANY WARRANTY; without even the implied warranty of
15 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 GNU General Public License for more details.
18 You should have received a copy of the GNU General Public License
19 along with GNU Emacs; see the file COPYING. If not, write to
20 the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
21 Boston, MA 02111-1307, USA. */
23 /*** TABLE OF CONTENTS ***
27 2. Emacs' internal format (emacs-mule) handlers
29 4. Shift-JIS and BIG5 handlers
31 6. End-of-line handlers
32 7. C library functions
33 8. Emacs Lisp library functions
38 /*** 0. General comments ***/
41 /*** GENERAL NOTE on CODING SYSTEMS ***
43 A coding system is an encoding mechanism for one or more character
44 sets. Here's a list of coding systems which Emacs can handle. When
45 we say "decode", it means converting some other coding system to
46 Emacs' internal format (emacs-mule), and when we say "encode",
47 it means converting the coding system emacs-mule to some other
50 0. Emacs' internal format (emacs-mule)
52 Emacs itself holds a multi-lingual character in buffers and strings
53 in a special format. Details are described in section 2.
57 The most famous coding system for multiple character sets. X's
58 Compound Text, various EUCs (Extended Unix Code), and coding
59 systems used in Internet communication such as ISO-2022-JP are
60 all variants of ISO2022. Details are described in section 3.
62 2. SJIS (or Shift-JIS or MS-Kanji-Code)
64 A coding system to encode character sets: ASCII, JISX0201, and
65 JISX0208. Widely used for PC's in Japan. Details are described in
70 A coding system to encode the character sets ASCII and Big5. Widely
71 used for Chinese (mainly in Taiwan and Hong Kong). Details are
72 described in section 4. In this file, when we write "BIG5"
73 (all uppercase), we mean the coding system, and when we write
74 "Big5" (capitalized), we mean the character set.
78 A coding system for text containing random 8-bit code. Emacs does
79 no code conversion on such text except for end-of-line format.
83 If a user wants to read/write text encoded in a coding system not
84 listed above, he can supply a decoder and an encoder for it as CCL
85 (Code Conversion Language) programs. Emacs executes the CCL program
86 while reading/writing.
88 Emacs represents a coding system by a Lisp symbol that has a property
89 `coding-system'. But, before actually using the coding system, the
90 information about it is set in a structure of type `struct
91 coding_system' for rapid processing. See section 6 for more details.
95 /*** GENERAL NOTES on END-OF-LINE FORMAT ***
97 How end-of-line of text is encoded depends on the operating system.
98 For instance, Unix's format is just one byte of `line-feed' code,
99 whereas DOS's format is two-byte sequence of `carriage-return' and
100 `line-feed' codes. MacOS's format is usually one byte of
103 Since text character encoding and end-of-line encoding are
104 independent, any coding system described above can have any
105 end-of-line format. So Emacs has information about end-of-line
106 format in each coding-system. See section 6 for more details.
110 /*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
112 These functions check if a text between SRC and SRC_END is encoded
113 in the coding system category XXX. Each returns an integer value in
114 which appropriate flag bits for the category XXX are set. The flag
115 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
116 template for these functions. If MULTIBYTEP is nonzero, 8-bit codes
117 of the range 0x80..0x9F are in multibyte form. */
120 detect_coding_emacs_mule (src
, src_end
, multibytep
)
121 unsigned char *src
, *src_end
;
128 /*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
130 These functions decode SRC_BYTES length of unibyte text at SOURCE
131 encoded in CODING to Emacs' internal format. The resulting
132 multibyte text goes to a place pointed to by DESTINATION, the length
133 of which should not exceed DST_BYTES.
135 These functions set the information about original and decoded texts
136 in the members `produced', `produced_char', `consumed', and
137 `consumed_char' of the structure *CODING. They also set the member
138 `result' to one of CODING_FINISH_XXX indicating how the decoding
141 DST_BYTES zero means that the source area and destination area are
142 overlapped, which means that we can produce a decoded text until it
143 reaches the head of the not-yet-decoded source text.
145 Below is a template for these functions. */
148 decode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
149 struct coding_system
*coding
;
150 unsigned char *source
, *destination
;
151 int src_bytes
, dst_bytes
;
157 /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
159 These functions encode SRC_BYTES length text at SOURCE from Emacs'
160 internal multibyte format to CODING. The resulting unibyte text
161 goes to a place pointed to by DESTINATION, the length of which
162 should not exceed DST_BYTES.
164 These functions set the information about original and encoded texts
165 in the members `produced', `produced_char', `consumed', and
166 `consumed_char' of the structure *CODING. They also set the member
167 `result' to one of CODING_FINISH_XXX indicating how the encoding
170 DST_BYTES zero means that the source area and destination area are
171 overlapped, which means that we can produce encoded text until it
172 reaches at the head of the not-yet-encoded source text.
174 Below is a template for these functions. */
177 encode_coding_XXX (coding
, source
, destination
, src_bytes
, dst_bytes
)
178 struct coding_system
*coding
;
179 unsigned char *source
, *destination
;
180 int src_bytes
, dst_bytes
;
186 /*** COMMONLY USED MACROS ***/
188 /* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
189 get one, two, and three bytes from the source text respectively.
190 If there are not enough bytes in the source, they jump to
191 `label_end_of_loop'. The caller should set variables `coding',
192 `src' and `src_end' to appropriate pointer in advance. These
193 macros are called from decoding routines `decode_coding_XXX', thus
194 it is assumed that the source text is unibyte. */
196 #define ONE_MORE_BYTE(c1) \
198 if (src >= src_end) \
200 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
201 goto label_end_of_loop; \
206 #define TWO_MORE_BYTES(c1, c2) \
208 if (src + 1 >= src_end) \
210 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
211 goto label_end_of_loop; \
218 /* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
219 form if MULTIBYTEP is nonzero. */
221 #define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep) \
223 if (src >= src_end) \
225 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
226 goto label_end_of_loop; \
229 if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL) \
230 c1 = *src++ - 0x20; \
233 /* Set C to the next character at the source text pointed by `src'.
234 If there are not enough characters in the source, jump to
235 `label_end_of_loop'. The caller should set variables `coding'
236 `src', `src_end', and `translation_table' to appropriate pointers
237 in advance. This macro is used in encoding routines
238 `encode_coding_XXX', thus it assumes that the source text is in
239 multibyte form except for 8-bit characters. 8-bit characters are
240 in multibyte form if coding->src_multibyte is nonzero, else they
241 are represented by a single byte. */
243 #define ONE_MORE_CHAR(c) \
245 int len = src_end - src; \
249 coding->result = CODING_FINISH_INSUFFICIENT_SRC; \
250 goto label_end_of_loop; \
252 if (coding->src_multibyte \
253 || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes)) \
254 c = STRING_CHAR_AND_LENGTH (src, len, bytes); \
256 c = *src, bytes = 1; \
257 if (!NILP (translation_table)) \
258 c = translate_char (translation_table, c, -1, 0, 0); \
263 /* Produce a multibyte form of character C to `dst'. Jump to
264 `label_end_of_loop' if there's not enough space at `dst'.
266 If we are now in the middle of a composition sequence, the decoded
267 character may be ALTCHAR (for the current composition). In that
268 case, the character goes to coding->cmp_data->data instead of
271 This macro is used in decoding routines. */
273 #define EMIT_CHAR(c) \
275 if (! COMPOSING_P (coding) \
276 || coding->composing == COMPOSITION_RELATIVE \
277 || coding->composing == COMPOSITION_WITH_RULE) \
279 int bytes = CHAR_BYTES (c); \
280 if ((dst + bytes) > (dst_bytes ? dst_end : src)) \
282 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
283 goto label_end_of_loop; \
285 dst += CHAR_STRING (c, dst); \
286 coding->produced_char++; \
289 if (COMPOSING_P (coding) \
290 && coding->composing != COMPOSITION_RELATIVE) \
292 CODING_ADD_COMPOSITION_COMPONENT (coding, c); \
293 coding->composition_rule_follows \
294 = coding->composing != COMPOSITION_WITH_ALTCHARS; \
299 #define EMIT_ONE_BYTE(c) \
301 if (dst >= (dst_bytes ? dst_end : src)) \
303 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
304 goto label_end_of_loop; \
309 #define EMIT_TWO_BYTES(c1, c2) \
311 if (dst + 2 > (dst_bytes ? dst_end : src)) \
313 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
314 goto label_end_of_loop; \
316 *dst++ = c1, *dst++ = c2; \
319 #define EMIT_BYTES(from, to) \
321 if (dst + (to - from) > (dst_bytes ? dst_end : src)) \
323 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
324 goto label_end_of_loop; \
331 /*** 1. Preamble ***/
344 #include "composite.h"
348 #include "intervals.h"
350 #else /* not emacs */
354 #endif /* not emacs */
356 Lisp_Object Qcoding_system
, Qeol_type
;
357 Lisp_Object Qbuffer_file_coding_system
;
358 Lisp_Object Qpost_read_conversion
, Qpre_write_conversion
;
359 Lisp_Object Qno_conversion
, Qundecided
;
360 Lisp_Object Qcoding_system_history
;
361 Lisp_Object Qsafe_chars
;
362 Lisp_Object Qvalid_codes
;
364 extern Lisp_Object Qinsert_file_contents
, Qwrite_region
;
365 Lisp_Object Qcall_process
, Qcall_process_region
, Qprocess_argument
;
366 Lisp_Object Qstart_process
, Qopen_network_stream
;
367 Lisp_Object Qtarget_idx
;
369 Lisp_Object Vselect_safe_coding_system_function
;
371 int coding_system_require_warning
;
373 /* Mnemonic string for each format of end-of-line. */
374 Lisp_Object eol_mnemonic_unix
, eol_mnemonic_dos
, eol_mnemonic_mac
;
375 /* Mnemonic string to indicate format of end-of-line is not yet
377 Lisp_Object eol_mnemonic_undecided
;
379 /* Format of end-of-line decided by system. This is CODING_EOL_LF on
380 Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac. */
385 /* Information about which coding system is safe for which chars.
386 The value has the form (GENERIC-LIST . NON-GENERIC-ALIST).
388 GENERIC-LIST is a list of generic coding systems which can encode
391 NON-GENERIC-ALIST is an alist of non generic coding systems vs the
392 corresponding char table that contains safe chars. */
393 Lisp_Object Vcoding_system_safe_chars
;
395 Lisp_Object Vcoding_system_list
, Vcoding_system_alist
;
397 Lisp_Object Qcoding_system_p
, Qcoding_system_error
;
399 /* Coding system emacs-mule and raw-text are for converting only
400 end-of-line format. */
401 Lisp_Object Qemacs_mule
, Qraw_text
;
405 /* Coding-systems are handed between Emacs Lisp programs and C internal
406 routines by the following three variables. */
407 /* Coding-system for reading files and receiving data from process. */
408 Lisp_Object Vcoding_system_for_read
;
409 /* Coding-system for writing files and sending data to process. */
410 Lisp_Object Vcoding_system_for_write
;
411 /* Coding-system actually used in the latest I/O. */
412 Lisp_Object Vlast_coding_system_used
;
414 /* A vector of length 256 which contains information about special
415 Latin codes (especially for dealing with Microsoft codes). */
416 Lisp_Object Vlatin_extra_code_table
;
418 /* Flag to inhibit code conversion of end-of-line format. */
419 int inhibit_eol_conversion
;
421 /* Flag to inhibit ISO2022 escape sequence detection. */
422 int inhibit_iso_escape_detection
;
424 /* Flag to make buffer-file-coding-system inherit from process-coding. */
425 int inherit_process_coding_system
;
427 /* Coding system to be used to encode text for terminal display. */
428 struct coding_system terminal_coding
;
430 /* Coding system to be used to encode text for terminal display when
431 terminal coding system is nil. */
432 struct coding_system safe_terminal_coding
;
434 /* Coding system of what is sent from terminal keyboard. */
435 struct coding_system keyboard_coding
;
437 /* Default coding system to be used to write a file. */
438 struct coding_system default_buffer_file_coding
;
440 Lisp_Object Vfile_coding_system_alist
;
441 Lisp_Object Vprocess_coding_system_alist
;
442 Lisp_Object Vnetwork_coding_system_alist
;
444 Lisp_Object Vlocale_coding_system
;
448 Lisp_Object Qcoding_category
, Qcoding_category_index
;
450 /* List of symbols `coding-category-xxx' ordered by priority. */
451 Lisp_Object Vcoding_category_list
;
453 /* Table of coding categories (Lisp symbols). */
454 Lisp_Object Vcoding_category_table
;
456 /* Table of names of symbol for each coding-category. */
457 char *coding_category_name
[CODING_CATEGORY_IDX_MAX
] = {
458 "coding-category-emacs-mule",
459 "coding-category-sjis",
460 "coding-category-iso-7",
461 "coding-category-iso-7-tight",
462 "coding-category-iso-8-1",
463 "coding-category-iso-8-2",
464 "coding-category-iso-7-else",
465 "coding-category-iso-8-else",
466 "coding-category-ccl",
467 "coding-category-big5",
468 "coding-category-utf-8",
469 "coding-category-utf-16-be",
470 "coding-category-utf-16-le",
471 "coding-category-raw-text",
472 "coding-category-binary"
475 /* Table of pointers to coding systems corresponding to each coding
477 struct coding_system
*coding_system_table
[CODING_CATEGORY_IDX_MAX
];
479 /* Table of coding category masks. Nth element is a mask for a coding
480 category of which priority is Nth. */
482 int coding_priorities
[CODING_CATEGORY_IDX_MAX
];
484 /* Flag to tell if we look up translation table on character code
486 Lisp_Object Venable_character_translation
;
487 /* Standard translation table to look up on decoding (reading). */
488 Lisp_Object Vstandard_translation_table_for_decode
;
489 /* Standard translation table to look up on encoding (writing). */
490 Lisp_Object Vstandard_translation_table_for_encode
;
492 Lisp_Object Qtranslation_table
;
493 Lisp_Object Qtranslation_table_id
;
494 Lisp_Object Qtranslation_table_for_decode
;
495 Lisp_Object Qtranslation_table_for_encode
;
497 /* Alist of charsets vs revision number. */
498 Lisp_Object Vcharset_revision_alist
;
500 /* Default coding systems used for process I/O. */
501 Lisp_Object Vdefault_process_coding_system
;
503 /* Char table for translating Quail and self-inserting input. */
504 Lisp_Object Vtranslation_table_for_input
;
506 /* Global flag to tell that we can't call post-read-conversion and
507 pre-write-conversion functions. Usually the value is zero, but it
508 is set to 1 temporarily while such functions are running. This is
509 to avoid infinite recursive call. */
510 static int inhibit_pre_post_conversion
;
512 Lisp_Object Qchar_coding_system
;
514 /* Return `safe-chars' property of CODING_SYSTEM (symbol). Don't check
518 coding_safe_chars (coding_system
)
519 Lisp_Object coding_system
;
521 Lisp_Object coding_spec
, plist
, safe_chars
;
523 coding_spec
= Fget (coding_system
, Qcoding_system
);
524 plist
= XVECTOR (coding_spec
)->contents
[3];
525 safe_chars
= Fplist_get (XVECTOR (coding_spec
)->contents
[3], Qsafe_chars
);
526 return (CHAR_TABLE_P (safe_chars
) ? safe_chars
: Qt
);
529 #define CODING_SAFE_CHAR_P(safe_chars, c) \
530 (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
533 /*** 2. Emacs internal format (emacs-mule) handlers ***/
535 /* Emacs' internal format for representation of multiple character
536 sets is a kind of multi-byte encoding, i.e. characters are
537 represented by variable-length sequences of one-byte codes.
539 ASCII characters and control characters (e.g. `tab', `newline') are
540 represented by one-byte sequences which are their ASCII codes, in
541 the range 0x00 through 0x7F.
543 8-bit characters of the range 0x80..0x9F are represented by
544 two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
547 8-bit characters of the range 0xA0..0xFF are represented by
548 one-byte sequences which are their 8-bit code.
550 The other characters are represented by a sequence of `base
551 leading-code', optional `extended leading-code', and one or two
552 `position-code's. The length of the sequence is determined by the
553 base leading-code. Leading-code takes the range 0x81 through 0x9D,
554 whereas extended leading-code and position-code take the range 0xA0
555 through 0xFF. See `charset.h' for more details about leading-code
558 --- CODE RANGE of Emacs' internal format ---
562 eight-bit-control LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
563 eight-bit-graphic 0xA0..0xBF
564 ELSE 0x81..0x9D + [0xA0..0xFF]+
565 ---------------------------------------------
567 As this is the internal character representation, the format is
568 usually not used externally (i.e. in a file or in a data sent to a
569 process). But, it is possible to have a text externally in this
570 format (i.e. by encoding by the coding system `emacs-mule').
572 In that case, a sequence of one-byte codes has a slightly different
575 Firstly, all characters in eight-bit-control are represented by
576 one-byte sequences which are their 8-bit code.
578 Next, character composition data are represented by the byte
579 sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ...,
581 METHOD is 0xF0 plus one of composition method (enum
584 BYTES is 0xA0 plus the byte length of these composition data,
586 CHARS is 0xA0 plus the number of characters composed by these
589 COMPONENTs are characters of multibyte form or composition
590 rules encoded by two-byte of ASCII codes.
592 In addition, for backward compatibility, the following formats are
593 also recognized as composition data on decoding.
596 0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ
599 MSEQ is a multibyte form but in these special format:
600 ASCII: 0xA0 ASCII_CODE+0x80,
601 other: LEADING_CODE+0x20 FOLLOWING-BYTE ...,
602 RULE is a one byte code of the range 0xA0..0xF0 that
603 represents a composition rule.
606 enum emacs_code_class_type emacs_code_class
[256];
608 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
609 Check if a text is encoded in Emacs' internal format. If it is,
610 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
613 detect_coding_emacs_mule (src
, src_end
, multibytep
)
614 unsigned char *src
, *src_end
;
619 /* Dummy for ONE_MORE_BYTE. */
620 struct coding_system dummy_coding
;
621 struct coding_system
*coding
= &dummy_coding
;
625 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
633 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
642 if (c
== ISO_CODE_ESC
|| c
== ISO_CODE_SI
|| c
== ISO_CODE_SO
)
645 else if (c
>= 0x80 && c
< 0xA0)
648 /* Old leading code for a composite character. */
652 unsigned char *src_base
= src
- 1;
655 if (!UNIBYTE_STR_AS_MULTIBYTE_P (src_base
, src_end
- src_base
,
658 src
= src_base
+ bytes
;
663 return CODING_CATEGORY_MASK_EMACS_MULE
;
667 /* Record the starting position START and METHOD of one composition. */
669 #define CODING_ADD_COMPOSITION_START(coding, start, method) \
671 struct composition_data *cmp_data = coding->cmp_data; \
672 int *data = cmp_data->data + cmp_data->used; \
673 coding->cmp_data_start = cmp_data->used; \
675 data[1] = cmp_data->char_offset + start; \
676 data[3] = (int) method; \
677 cmp_data->used += 4; \
680 /* Record the ending position END of the current composition. */
682 #define CODING_ADD_COMPOSITION_END(coding, end) \
684 struct composition_data *cmp_data = coding->cmp_data; \
685 int *data = cmp_data->data + coding->cmp_data_start; \
686 data[0] = cmp_data->used - coding->cmp_data_start; \
687 data[2] = cmp_data->char_offset + end; \
690 /* Record one COMPONENT (alternate character or composition rule). */
692 #define CODING_ADD_COMPOSITION_COMPONENT(coding, component) \
694 coding->cmp_data->data[coding->cmp_data->used++] = component; \
695 if (coding->cmp_data->used - coding->cmp_data_start \
696 == COMPOSITION_DATA_MAX_BUNCH_LENGTH) \
698 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
699 coding->composing = COMPOSITION_NO; \
704 /* Get one byte from a data pointed by SRC and increment SRC. If SRC
705 is not less than SRC_END, return -1 without incrementing Src. */
707 #define SAFE_ONE_MORE_BYTE() (src >= src_end ? -1 : *src++)
710 /* Decode a character represented as a component of composition
711 sequence of Emacs 20 style at SRC. Set C to that character, store
712 its multibyte form sequence at P, and set P to the end of that
713 sequence. If no valid character is found, set C to -1. */
715 #define DECODE_EMACS_MULE_COMPOSITION_CHAR(c, p) \
719 c = SAFE_ONE_MORE_BYTE (); \
722 if (CHAR_HEAD_P (c)) \
724 else if (c == 0xA0) \
726 c = SAFE_ONE_MORE_BYTE (); \
735 else if (BASE_LEADING_CODE_P (c - 0x20)) \
737 unsigned char *p0 = p; \
741 bytes = BYTES_BY_CHAR_HEAD (c); \
744 c = SAFE_ONE_MORE_BYTE (); \
749 if (UNIBYTE_STR_AS_MULTIBYTE_P (p0, p - p0, bytes) \
750 || (coding->flags /* We are recovering a file. */ \
751 && p0[0] == LEADING_CODE_8_BIT_CONTROL \
752 && ! CHAR_HEAD_P (p0[1]))) \
753 c = STRING_CHAR (p0, bytes); \
762 /* Decode a composition rule represented as a component of composition
763 sequence of Emacs 20 style at SRC. Set C to the rule. If not
764 valid rule is found, set C to -1. */
766 #define DECODE_EMACS_MULE_COMPOSITION_RULE(c) \
768 c = SAFE_ONE_MORE_BYTE (); \
770 if (c < 0 || c >= 81) \
774 gref = c / 9, nref = c % 9; \
775 c = COMPOSITION_ENCODE_RULE (gref, nref); \
780 /* Decode composition sequence encoded by `emacs-mule' at the source
781 pointed by SRC. SRC_END is the end of source. Store information
782 of the composition in CODING->cmp_data.
784 For backward compatibility, decode also a composition sequence of
785 Emacs 20 style. In that case, the composition sequence contains
786 characters that should be extracted into a buffer or string. Store
787 those characters at *DESTINATION in multibyte form.
789 If we encounter an invalid byte sequence, return 0.
790 If we encounter an insufficient source or destination, or
791 insufficient space in CODING->cmp_data, return 1.
792 Otherwise, return consumed bytes in the source.
796 decode_composition_emacs_mule (coding
, src
, src_end
,
797 destination
, dst_end
, dst_bytes
)
798 struct coding_system
*coding
;
799 unsigned char *src
, *src_end
, **destination
, *dst_end
;
802 unsigned char *dst
= *destination
;
803 int method
, data_len
, nchars
;
804 unsigned char *src_base
= src
++;
805 /* Store components of composition. */
806 int component
[COMPOSITION_DATA_MAX_BUNCH_LENGTH
];
808 /* Store multibyte form of characters to be composed. This is for
809 Emacs 20 style composition sequence. */
810 unsigned char buf
[MAX_COMPOSITION_COMPONENTS
* MAX_MULTIBYTE_LENGTH
];
811 unsigned char *bufp
= buf
;
812 int c
, i
, gref
, nref
;
814 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
815 >= COMPOSITION_DATA_SIZE
)
817 coding
->result
= CODING_FINISH_INSUFFICIENT_CMP
;
822 if (c
- 0xF0 >= COMPOSITION_RELATIVE
823 && c
- 0xF0 <= COMPOSITION_WITH_RULE_ALTCHARS
)
828 with_rule
= (method
== COMPOSITION_WITH_RULE
829 || method
== COMPOSITION_WITH_RULE_ALTCHARS
);
833 || src_base
+ data_len
> src_end
)
839 for (ncomponent
= 0; src
< src_base
+ data_len
; ncomponent
++)
841 /* If it is longer than this, it can't be valid. */
842 if (ncomponent
>= COMPOSITION_DATA_MAX_BUNCH_LENGTH
)
845 if (ncomponent
% 2 && with_rule
)
847 ONE_MORE_BYTE (gref
);
849 ONE_MORE_BYTE (nref
);
851 c
= COMPOSITION_ENCODE_RULE (gref
, nref
);
856 if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
)
857 || (coding
->flags
/* We are recovering a file. */
858 && src
[0] == LEADING_CODE_8_BIT_CONTROL
859 && ! CHAR_HEAD_P (src
[1])))
860 c
= STRING_CHAR (src
, bytes
);
865 component
[ncomponent
] = c
;
870 /* This may be an old Emacs 20 style format. See the comment at
871 the section 2 of this file. */
872 while (src
< src_end
&& !CHAR_HEAD_P (*src
)) src
++;
874 && !(coding
->mode
& CODING_MODE_LAST_BLOCK
))
875 goto label_end_of_loop
;
881 method
= COMPOSITION_RELATIVE
;
882 for (ncomponent
= 0; ncomponent
< MAX_COMPOSITION_COMPONENTS
;)
884 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
887 component
[ncomponent
++] = c
;
895 method
= COMPOSITION_WITH_RULE
;
897 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
902 ncomponent
< MAX_COMPOSITION_COMPONENTS
* 2 - 1;)
904 DECODE_EMACS_MULE_COMPOSITION_RULE (c
);
907 component
[ncomponent
++] = c
;
908 DECODE_EMACS_MULE_COMPOSITION_CHAR (c
, bufp
);
911 component
[ncomponent
++] = c
;
915 nchars
= (ncomponent
+ 1) / 2;
921 if (buf
== bufp
|| dst
+ (bufp
- buf
) <= (dst_bytes
? dst_end
: src
))
923 CODING_ADD_COMPOSITION_START (coding
, coding
->produced_char
, method
);
924 for (i
= 0; i
< ncomponent
; i
++)
925 CODING_ADD_COMPOSITION_COMPONENT (coding
, component
[i
]);
926 CODING_ADD_COMPOSITION_END (coding
, coding
->produced_char
+ nchars
);
929 unsigned char *p
= buf
;
930 EMIT_BYTES (p
, bufp
);
931 *destination
+= bufp
- buf
;
932 coding
->produced_char
+= nchars
;
934 return (src
- src_base
);
940 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
943 decode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
944 struct coding_system
*coding
;
945 unsigned char *source
, *destination
;
946 int src_bytes
, dst_bytes
;
948 unsigned char *src
= source
;
949 unsigned char *src_end
= source
+ src_bytes
;
950 unsigned char *dst
= destination
;
951 unsigned char *dst_end
= destination
+ dst_bytes
;
952 /* SRC_BASE remembers the start position in source in each loop.
953 The loop will be exited when there's not enough source code, or
954 when there's not enough destination area to produce a
956 unsigned char *src_base
;
958 coding
->produced_char
= 0;
959 while ((src_base
= src
) < src_end
)
961 unsigned char tmp
[MAX_MULTIBYTE_LENGTH
], *p
;
968 if (coding
->eol_type
== CODING_EOL_CR
)
970 else if (coding
->eol_type
== CODING_EOL_CRLF
)
980 coding
->produced_char
++;
983 else if (*src
== '\n')
985 if ((coding
->eol_type
== CODING_EOL_CR
986 || coding
->eol_type
== CODING_EOL_CRLF
)
987 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
989 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
990 goto label_end_of_loop
;
993 coding
->produced_char
++;
996 else if (*src
== 0x80 && coding
->cmp_data
)
998 /* Start of composition data. */
999 int consumed
= decode_composition_emacs_mule (coding
, src
, src_end
,
1003 goto label_end_of_loop
;
1004 else if (consumed
> 0)
1009 bytes
= CHAR_STRING (*src
, tmp
);
1013 else if (UNIBYTE_STR_AS_MULTIBYTE_P (src
, src_end
- src
, bytes
)
1014 || (coding
->flags
/* We are recovering a file. */
1015 && src
[0] == LEADING_CODE_8_BIT_CONTROL
1016 && ! CHAR_HEAD_P (src
[1])))
1023 bytes
= CHAR_STRING (*src
, tmp
);
1027 if (dst
+ bytes
>= (dst_bytes
? dst_end
: src
))
1029 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
1032 while (bytes
--) *dst
++ = *p
++;
1033 coding
->produced_char
++;
1036 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
1037 coding
->produced
= dst
- destination
;
1041 /* Encode composition data stored at DATA into a special byte sequence
1042 starting by 0x80. Update CODING->cmp_data_start and maybe
1043 CODING->cmp_data for the next call. */
1045 #define ENCODE_COMPOSITION_EMACS_MULE(coding, data) \
1047 unsigned char buf[1024], *p0 = buf, *p; \
1048 int len = data[0]; \
1052 buf[1] = 0xF0 + data[3]; /* METHOD */ \
1053 buf[3] = 0xA0 + (data[2] - data[1]); /* COMPOSED-CHARS */ \
1055 if (data[3] == COMPOSITION_WITH_RULE \
1056 || data[3] == COMPOSITION_WITH_RULE_ALTCHARS) \
1058 p += CHAR_STRING (data[4], p); \
1059 for (i = 5; i < len; i += 2) \
1062 COMPOSITION_DECODE_RULE (data[i], gref, nref); \
1063 *p++ = 0x20 + gref; \
1064 *p++ = 0x20 + nref; \
1065 p += CHAR_STRING (data[i + 1], p); \
1070 for (i = 4; i < len; i++) \
1071 p += CHAR_STRING (data[i], p); \
1073 buf[2] = 0xA0 + (p - buf); /* COMPONENTS-BYTES */ \
1075 if (dst + (p - buf) + 4 > (dst_bytes ? dst_end : src)) \
1077 coding->result = CODING_FINISH_INSUFFICIENT_DST; \
1078 goto label_end_of_loop; \
1082 coding->cmp_data_start += data[0]; \
1083 if (coding->cmp_data_start == coding->cmp_data->used \
1084 && coding->cmp_data->next) \
1086 coding->cmp_data = coding->cmp_data->next; \
1087 coding->cmp_data_start = 0; \
1092 static void encode_eol
P_ ((struct coding_system
*, const unsigned char *,
1093 unsigned char *, int, int));
1096 encode_coding_emacs_mule (coding
, source
, destination
, src_bytes
, dst_bytes
)
1097 struct coding_system
*coding
;
1098 unsigned char *source
, *destination
;
1099 int src_bytes
, dst_bytes
;
1101 unsigned char *src
= source
;
1102 unsigned char *src_end
= source
+ src_bytes
;
1103 unsigned char *dst
= destination
;
1104 unsigned char *dst_end
= destination
+ dst_bytes
;
1105 unsigned char *src_base
;
1110 Lisp_Object translation_table
;
1112 translation_table
= Qnil
;
1114 /* Optimization for the case that there's no composition. */
1115 if (!coding
->cmp_data
|| coding
->cmp_data
->used
== 0)
1117 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
1121 char_offset
= coding
->cmp_data
->char_offset
;
1122 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1127 /* If SRC starts a composition, encode the information about the
1128 composition in advance. */
1129 if (coding
->cmp_data_start
< coding
->cmp_data
->used
1130 && char_offset
+ coding
->consumed_char
== data
[1])
1132 ENCODE_COMPOSITION_EMACS_MULE (coding
, data
);
1133 char_offset
= coding
->cmp_data
->char_offset
;
1134 data
= coding
->cmp_data
->data
+ coding
->cmp_data_start
;
1138 if (c
== '\n' && (coding
->eol_type
== CODING_EOL_CRLF
1139 || coding
->eol_type
== CODING_EOL_CR
))
1141 if (coding
->eol_type
== CODING_EOL_CRLF
)
1142 EMIT_TWO_BYTES ('\r', c
);
1144 EMIT_ONE_BYTE ('\r');
1146 else if (SINGLE_BYTE_CHAR_P (c
))
1148 if (coding
->flags
&& ! ASCII_BYTE_P (c
))
1150 /* As we are auto saving, retain the multibyte form for
1152 unsigned char buf
[MAX_MULTIBYTE_LENGTH
];
1153 int bytes
= CHAR_STRING (c
, buf
);
1156 EMIT_ONE_BYTE (buf
[0]);
1158 EMIT_TWO_BYTES (buf
[0], buf
[1]);
1164 EMIT_BYTES (src_base
, src
);
1165 coding
->consumed_char
++;
1168 coding
->consumed
= src_base
- source
;
1169 coding
->produced
= coding
->produced_char
= dst
- destination
;
1174 /*** 3. ISO2022 handlers ***/
1176 /* The following note describes the coding system ISO2022 briefly.
1177 Since the intention of this note is to help understand the
1178 functions in this file, some parts are NOT ACCURATE or are OVERLY
1179 SIMPLIFIED. For thorough understanding, please refer to the
1180 original document of ISO2022. This is equivalent to the standard
1181 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
1183 ISO2022 provides many mechanisms to encode several character sets
1184 in 7-bit and 8-bit environments. For 7-bit environments, all text
1185 is encoded using bytes less than 128. This may make the encoded
1186 text a little bit longer, but the text passes more easily through
1187 several types of gateway, some of which strip off the MSB (Most
1190 There are two kinds of character sets: control character sets and
1191 graphic character sets. The former contain control characters such
1192 as `newline' and `escape' to provide control functions (control
1193 functions are also provided by escape sequences). The latter
1194 contain graphic characters such as 'A' and '-'. Emacs recognizes
1195 two control character sets and many graphic character sets.
1197 Graphic character sets are classified into one of the following
1198 four classes, according to the number of bytes (DIMENSION) and
1199 number of characters in one dimension (CHARS) of the set:
1200 - DIMENSION1_CHARS94
1201 - DIMENSION1_CHARS96
1202 - DIMENSION2_CHARS94
1203 - DIMENSION2_CHARS96
1205 In addition, each character set is assigned an identification tag,
1206 unique for each set, called the "final character" (denoted as <F>
1207 hereafter). The <F> of each character set is decided by ECMA(*)
1208 when it is registered in ISO. The code range of <F> is 0x30..0x7F
1209 (0x30..0x3F are for private use only).
1211 Note (*): ECMA = European Computer Manufacturers Association
1213 Here are examples of graphic character sets [NAME(<F>)]:
1214 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
1215 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
1216 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
1217 o DIMENSION2_CHARS96 -- none for the moment
1219 A code area (1 byte=8 bits) is divided into 4 areas, C0, GL, C1, and GR.
1220 C0 [0x00..0x1F] -- control character plane 0
1221 GL [0x20..0x7F] -- graphic character plane 0
1222 C1 [0x80..0x9F] -- control character plane 1
1223 GR [0xA0..0xFF] -- graphic character plane 1
1225 A control character set is directly designated and invoked to C0 or
1226 C1 by an escape sequence. The most common case is that:
1227 - ISO646's control character set is designated/invoked to C0, and
1228 - ISO6429's control character set is designated/invoked to C1,
1229 and usually these designations/invocations are omitted in encoded
1230 text. In a 7-bit environment, only C0 can be used, and a control
1231 character for C1 is encoded by an appropriate escape sequence to
1232 fit into the environment. All control characters for C1 are
1233 defined to have corresponding escape sequences.
1235 A graphic character set is at first designated to one of four
1236 graphic registers (G0 through G3), then these graphic registers are
1237 invoked to GL or GR. These designations and invocations can be
1238 done independently. The most common case is that G0 is invoked to
1239 GL, G1 is invoked to GR, and ASCII is designated to G0. Usually
1240 these invocations and designations are omitted in encoded text.
1241 In a 7-bit environment, only GL can be used.
1243 When a graphic character set of CHARS94 is invoked to GL, codes
1244 0x20 and 0x7F of the GL area work as control characters SPACE and
1245 DEL respectively, and codes 0xA0 and 0xFF of the GR area should not
1248 There are two ways of invocation: locking-shift and single-shift.
1249 With locking-shift, the invocation lasts until the next different
1250 invocation, whereas with single-shift, the invocation affects the
1251 following character only and doesn't affect the locking-shift
1252 state. Invocations are done by the following control characters or
1255 ----------------------------------------------------------------------
1256 abbrev function cntrl escape seq description
1257 ----------------------------------------------------------------------
1258 SI/LS0 (shift-in) 0x0F none invoke G0 into GL
1259 SO/LS1 (shift-out) 0x0E none invoke G1 into GL
1260 LS2 (locking-shift-2) none ESC 'n' invoke G2 into GL
1261 LS3 (locking-shift-3) none ESC 'o' invoke G3 into GL
1262 LS1R (locking-shift-1 right) none ESC '~' invoke G1 into GR (*)
1263 LS2R (locking-shift-2 right) none ESC '}' invoke G2 into GR (*)
1264 LS3R (locking-shift 3 right) none ESC '|' invoke G3 into GR (*)
1265 SS2 (single-shift-2) 0x8E ESC 'N' invoke G2 for one char
1266 SS3 (single-shift-3) 0x8F ESC 'O' invoke G3 for one char
1267 ----------------------------------------------------------------------
1268 (*) These are not used by any known coding system.
1270 Control characters for these functions are defined by macros
1271 ISO_CODE_XXX in `coding.h'.
1273 Designations are done by the following escape sequences:
1274 ----------------------------------------------------------------------
1275 escape sequence description
1276 ----------------------------------------------------------------------
1277 ESC '(' <F> designate DIMENSION1_CHARS94<F> to G0
1278 ESC ')' <F> designate DIMENSION1_CHARS94<F> to G1
1279 ESC '*' <F> designate DIMENSION1_CHARS94<F> to G2
1280 ESC '+' <F> designate DIMENSION1_CHARS94<F> to G3
1281 ESC ',' <F> designate DIMENSION1_CHARS96<F> to G0 (*)
1282 ESC '-' <F> designate DIMENSION1_CHARS96<F> to G1
1283 ESC '.' <F> designate DIMENSION1_CHARS96<F> to G2
1284 ESC '/' <F> designate DIMENSION1_CHARS96<F> to G3
1285 ESC '$' '(' <F> designate DIMENSION2_CHARS94<F> to G0 (**)
1286 ESC '$' ')' <F> designate DIMENSION2_CHARS94<F> to G1
1287 ESC '$' '*' <F> designate DIMENSION2_CHARS94<F> to G2
1288 ESC '$' '+' <F> designate DIMENSION2_CHARS94<F> to G3
1289 ESC '$' ',' <F> designate DIMENSION2_CHARS96<F> to G0 (*)
1290 ESC '$' '-' <F> designate DIMENSION2_CHARS96<F> to G1
1291 ESC '$' '.' <F> designate DIMENSION2_CHARS96<F> to G2
1292 ESC '$' '/' <F> designate DIMENSION2_CHARS96<F> to G3
1293 ----------------------------------------------------------------------
1295 In this list, "DIMENSION1_CHARS94<F>" means a graphic character set
1296 of dimension 1, chars 94, and final character <F>, etc...
1298 Note (*): Although these designations are not allowed in ISO2022,
1299 Emacs accepts them on decoding, and produces them on encoding
1300 CHARS96 character sets in a coding system which is characterized as
1301 7-bit environment, non-locking-shift, and non-single-shift.
1303 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
1304 '(' can be omitted. We refer to this as "short-form" hereafter.
1306 Now you may notice that there are a lot of ways of encoding the
1307 same multilingual text in ISO2022. Actually, there exist many
1308 coding systems such as Compound Text (used in X11's inter client
1309 communication, ISO-2022-JP (used in Japanese Internet), ISO-2022-KR
1310 (used in Korean Internet), EUC (Extended UNIX Code, used in Asian
1311 localized platforms), and all of these are variants of ISO2022.
1313 In addition to the above, Emacs handles two more kinds of escape
1314 sequences: ISO6429's direction specification and Emacs' private
1315 sequence for specifying character composition.
1317 ISO6429's direction specification takes the following form:
1318 o CSI ']' -- end of the current direction
1319 o CSI '0' ']' -- end of the current direction
1320 o CSI '1' ']' -- start of left-to-right text
1321 o CSI '2' ']' -- start of right-to-left text
1322 The control character CSI (0x9B: control sequence introducer) is
1323 abbreviated to the escape sequence ESC '[' in a 7-bit environment.
1325 Character composition specification takes the following form:
1326 o ESC '0' -- start relative composition
1327 o ESC '1' -- end composition
1328 o ESC '2' -- start rule-base composition (*)
1329 o ESC '3' -- start relative composition with alternate chars (**)
1330 o ESC '4' -- start rule-base composition with alternate chars (**)
1331 Since these are not standard escape sequences of any ISO standard,
1332 the use of them with these meanings is restricted to Emacs only.
1334 (*) This form is used only in Emacs 20.5 and older versions,
1335 but the newer versions can safely decode it.
1336 (**) This form is used only in Emacs 21.1 and newer versions,
1337 and the older versions can't decode it.
1339 Here's a list of example usages of these composition escape
1340 sequences (categorized by `enum composition_method').
1342 COMPOSITION_RELATIVE:
1343 ESC 0 CHAR [ CHAR ] ESC 1
1344 COMPOSITION_WITH_RULE:
1345 ESC 2 CHAR [ RULE CHAR ] ESC 1
1346 COMPOSITION_WITH_ALTCHARS:
1347 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
1348 COMPOSITION_WITH_RULE_ALTCHARS:
1349 ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
1351 enum iso_code_class_type iso_code_class
[256];
1353 #define CHARSET_OK(idx, charset, c) \
1354 (coding_system_table[idx] \
1355 && (charset == CHARSET_ASCII \
1356 || (safe_chars = coding_safe_chars (coding_system_table[idx]->symbol), \
1357 CODING_SAFE_CHAR_P (safe_chars, c))) \
1358 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding_system_table[idx], \
1360 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
1362 #define SHIFT_OUT_OK(idx) \
1363 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
1365 #define COMPOSITION_OK(idx) \
1366 (coding_system_table[idx]->composing != COMPOSITION_DISABLED)
1368 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
1369 Check if a text is encoded in ISO2022. If it is, return an
1370 integer in which appropriate flag bits any of:
1371 CODING_CATEGORY_MASK_ISO_7
1372 CODING_CATEGORY_MASK_ISO_7_TIGHT
1373 CODING_CATEGORY_MASK_ISO_8_1
1374 CODING_CATEGORY_MASK_ISO_8_2
1375 CODING_CATEGORY_MASK_ISO_7_ELSE
1376 CODING_CATEGORY_MASK_ISO_8_ELSE
1377 are set. If a code which should never appear in ISO2022 is found,
1381 detect_coding_iso2022 (src
, src_end
, multibytep
)
1382 unsigned char *src
, *src_end
;
1385 int mask
= CODING_CATEGORY_MASK_ISO
;
1387 int reg
[4], shift_out
= 0, single_shifting
= 0;
1389 /* Dummy for ONE_MORE_BYTE. */
1390 struct coding_system dummy_coding
;
1391 struct coding_system
*coding
= &dummy_coding
;
1392 Lisp_Object safe_chars
;
1394 reg
[0] = CHARSET_ASCII
, reg
[1] = reg
[2] = reg
[3] = -1;
1395 while (mask
&& src
< src_end
)
1397 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1402 if (inhibit_iso_escape_detection
)
1404 single_shifting
= 0;
1405 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1406 if (c
>= '(' && c
<= '/')
1408 /* Designation sequence for a charset of dimension 1. */
1409 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
1410 if (c1
< ' ' || c1
>= 0x80
1411 || (charset
= iso_charset_table
[0][c
>= ','][c1
]) < 0)
1412 /* Invalid designation sequence. Just ignore. */
1414 reg
[(c
- '(') % 4] = charset
;
1418 /* Designation sequence for a charset of dimension 2. */
1419 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1420 if (c
>= '@' && c
<= 'B')
1421 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
1422 reg
[0] = charset
= iso_charset_table
[1][0][c
];
1423 else if (c
>= '(' && c
<= '/')
1425 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
1426 if (c1
< ' ' || c1
>= 0x80
1427 || (charset
= iso_charset_table
[1][c
>= ','][c1
]) < 0)
1428 /* Invalid designation sequence. Just ignore. */
1430 reg
[(c
- '(') % 4] = charset
;
1433 /* Invalid designation sequence. Just ignore. */
1436 else if (c
== 'N' || c
== 'O')
1438 /* ESC <Fe> for SS2 or SS3. */
1439 mask
&= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1442 else if (c
>= '0' && c
<= '4')
1444 /* ESC <Fp> for start/end composition. */
1445 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7
))
1446 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1448 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1449 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
))
1450 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1452 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1453 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_1
))
1454 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1456 mask
&= ~CODING_CATEGORY_MASK_ISO_8_1
;
1457 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_2
))
1458 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1460 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1461 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
))
1462 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1464 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1465 if (COMPOSITION_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
))
1466 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1468 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1472 /* Invalid escape sequence. Just ignore. */
1475 /* We found a valid designation sequence for CHARSET. */
1476 mask
&= ~CODING_CATEGORY_MASK_ISO_8BIT
;
1477 c
= MAKE_CHAR (charset
, 0, 0);
1478 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7
, charset
, c
))
1479 mask_found
|= CODING_CATEGORY_MASK_ISO_7
;
1481 mask
&= ~CODING_CATEGORY_MASK_ISO_7
;
1482 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT
, charset
, c
))
1483 mask_found
|= CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1485 mask
&= ~CODING_CATEGORY_MASK_ISO_7_TIGHT
;
1486 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
, charset
, c
))
1487 mask_found
|= CODING_CATEGORY_MASK_ISO_7_ELSE
;
1489 mask
&= ~CODING_CATEGORY_MASK_ISO_7_ELSE
;
1490 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
, charset
, c
))
1491 mask_found
|= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1493 mask
&= ~CODING_CATEGORY_MASK_ISO_8_ELSE
;
1497 if (inhibit_iso_escape_detection
)
1499 single_shifting
= 0;
1502 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE
)
1503 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE
)))
1505 /* Locking shift out. */
1506 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1507 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1512 if (inhibit_iso_escape_detection
)
1514 single_shifting
= 0;
1517 /* Locking shift in. */
1518 mask
&= ~CODING_CATEGORY_MASK_ISO_7BIT
;
1519 mask_found
|= CODING_CATEGORY_MASK_ISO_SHIFT
;
1524 single_shifting
= 0;
1528 int newmask
= CODING_CATEGORY_MASK_ISO_8_ELSE
;
1530 if (inhibit_iso_escape_detection
)
1532 if (c
!= ISO_CODE_CSI
)
1534 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1535 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1536 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1537 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1538 & CODING_FLAG_ISO_SINGLE_SHIFT
)
1539 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1540 single_shifting
= 1;
1542 if (VECTORP (Vlatin_extra_code_table
)
1543 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1545 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1546 & CODING_FLAG_ISO_LATIN_EXTRA
)
1547 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1548 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1549 & CODING_FLAG_ISO_LATIN_EXTRA
)
1550 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1553 mask_found
|= newmask
;
1560 single_shifting
= 0;
1565 single_shifting
= 0;
1566 if (VECTORP (Vlatin_extra_code_table
)
1567 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
1571 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_1
]->flags
1572 & CODING_FLAG_ISO_LATIN_EXTRA
)
1573 newmask
|= CODING_CATEGORY_MASK_ISO_8_1
;
1574 if (coding_system_table
[CODING_CATEGORY_IDX_ISO_8_2
]->flags
1575 & CODING_FLAG_ISO_LATIN_EXTRA
)
1576 newmask
|= CODING_CATEGORY_MASK_ISO_8_2
;
1578 mask_found
|= newmask
;
1585 mask
&= ~(CODING_CATEGORY_MASK_ISO_7BIT
1586 | CODING_CATEGORY_MASK_ISO_7_ELSE
);
1587 mask_found
|= CODING_CATEGORY_MASK_ISO_8_1
;
1588 /* Check the length of succeeding codes of the range
1589 0xA0..0FF. If the byte length is odd, we exclude
1590 CODING_CATEGORY_MASK_ISO_8_2. We can check this only
1591 when we are not single shifting. */
1592 if (!single_shifting
1593 && mask
& CODING_CATEGORY_MASK_ISO_8_2
)
1598 while (src
< src_end
)
1600 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
1606 if (i
& 1 && src
< src_end
)
1607 mask
&= ~CODING_CATEGORY_MASK_ISO_8_2
;
1609 mask_found
|= CODING_CATEGORY_MASK_ISO_8_2
;
1611 /* This means that we have read one extra byte. */
1619 return (mask
& mask_found
);
1622 /* Decode a character of which charset is CHARSET, the 1st position
1623 code is C1, the 2nd position code is C2, and return the decoded
1624 character code. If the variable `translation_table' is non-nil,
1625 returned the translated code. */
1627 #define DECODE_ISO_CHARACTER(charset, c1, c2) \
1628 (NILP (translation_table) \
1629 ? MAKE_CHAR (charset, c1, c2) \
1630 : translate_char (translation_table, -1, charset, c1, c2))
1632 /* Set designation state into CODING. */
1633 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
1637 if (final_char < '0' || final_char >= 128) \
1638 goto label_invalid_code; \
1639 charset = ISO_CHARSET_TABLE (make_number (dimension), \
1640 make_number (chars), \
1641 make_number (final_char)); \
1642 c = MAKE_CHAR (charset, 0, 0); \
1644 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg \
1645 || CODING_SAFE_CHAR_P (safe_chars, c))) \
1647 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
1649 && charset == CHARSET_ASCII) \
1651 /* We should insert this designation sequence as is so \
1652 that it is surely written back to a file. */ \
1653 coding->spec.iso2022.last_invalid_designation_register = -1; \
1654 goto label_invalid_code; \
1656 coding->spec.iso2022.last_invalid_designation_register = -1; \
1657 if ((coding->mode & CODING_MODE_DIRECTION) \
1658 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
1659 charset = CHARSET_REVERSE_CHARSET (charset); \
1660 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
1664 coding->spec.iso2022.last_invalid_designation_register = reg; \
1665 goto label_invalid_code; \
1669 /* Allocate a memory block for storing information about compositions.
1670 The block is chained to the already allocated blocks. */
1673 coding_allocate_composition_data (coding
, char_offset
)
1674 struct coding_system
*coding
;
1677 struct composition_data
*cmp_data
1678 = (struct composition_data
*) xmalloc (sizeof *cmp_data
);
1680 cmp_data
->char_offset
= char_offset
;
1682 cmp_data
->prev
= coding
->cmp_data
;
1683 cmp_data
->next
= NULL
;
1684 if (coding
->cmp_data
)
1685 coding
->cmp_data
->next
= cmp_data
;
1686 coding
->cmp_data
= cmp_data
;
1687 coding
->cmp_data_start
= 0;
1690 /* Handle composition start sequence ESC 0, ESC 2, ESC 3, or ESC 4.
1691 ESC 0 : relative composition : ESC 0 CHAR ... ESC 1
1692 ESC 2 : rulebase composition : ESC 2 CHAR RULE CHAR RULE ... CHAR ESC 1
1693 ESC 3 : altchar composition : ESC 3 ALT ... ESC 0 CHAR ... ESC 1
1694 ESC 4 : alt&rule composition : ESC 4 ALT RULE .. ALT ESC 0 CHAR ... ESC 1
1697 #define DECODE_COMPOSITION_START(c1) \
1699 if (coding->composing == COMPOSITION_DISABLED) \
1701 *dst++ = ISO_CODE_ESC; \
1702 *dst++ = c1 & 0x7f; \
1703 coding->produced_char += 2; \
1705 else if (!COMPOSING_P (coding)) \
1707 /* This is surely the start of a composition. We must be sure \
1708 that coding->cmp_data has enough space to store the \
1709 information about the composition. If not, terminate the \
1710 current decoding loop, allocate one more memory block for \
1711 coding->cmp_data in the caller, then start the decoding \
1712 loop again. We can't allocate memory here directly because \
1713 it may cause buffer/string relocation. */ \
1714 if (!coding->cmp_data \
1715 || (coding->cmp_data->used + COMPOSITION_DATA_MAX_BUNCH_LENGTH \
1716 >= COMPOSITION_DATA_SIZE)) \
1718 coding->result = CODING_FINISH_INSUFFICIENT_CMP; \
1719 goto label_end_of_loop; \
1721 coding->composing = (c1 == '0' ? COMPOSITION_RELATIVE \
1722 : c1 == '2' ? COMPOSITION_WITH_RULE \
1723 : c1 == '3' ? COMPOSITION_WITH_ALTCHARS \
1724 : COMPOSITION_WITH_RULE_ALTCHARS); \
1725 CODING_ADD_COMPOSITION_START (coding, coding->produced_char, \
1726 coding->composing); \
1727 coding->composition_rule_follows = 0; \
1731 /* We are already handling a composition. If the method is \
1732 the following two, the codes following the current escape \
1733 sequence are actual characters stored in a buffer. */ \
1734 if (coding->composing == COMPOSITION_WITH_ALTCHARS \
1735 || coding->composing == COMPOSITION_WITH_RULE_ALTCHARS) \
1737 coding->composing = COMPOSITION_RELATIVE; \
1738 coding->composition_rule_follows = 0; \
1743 /* Handle composition end sequence ESC 1. */
1745 #define DECODE_COMPOSITION_END(c1) \
1747 if (! COMPOSING_P (coding)) \
1749 *dst++ = ISO_CODE_ESC; \
1751 coding->produced_char += 2; \
1755 CODING_ADD_COMPOSITION_END (coding, coding->produced_char); \
1756 coding->composing = COMPOSITION_NO; \
1760 /* Decode a composition rule from the byte C1 (and maybe one more byte
1761 from SRC) and store one encoded composition rule in
1762 coding->cmp_data. */
1764 #define DECODE_COMPOSITION_RULE(c1) \
1768 if (c1 < 81) /* old format (before ver.21) */ \
1770 int gref = (c1) / 9; \
1771 int nref = (c1) % 9; \
1772 if (gref == 4) gref = 10; \
1773 if (nref == 4) nref = 10; \
1774 rule = COMPOSITION_ENCODE_RULE (gref, nref); \
1776 else if (c1 < 93) /* new format (after ver.21) */ \
1778 ONE_MORE_BYTE (c2); \
1779 rule = COMPOSITION_ENCODE_RULE (c1 - 81, c2 - 32); \
1781 CODING_ADD_COMPOSITION_COMPONENT (coding, rule); \
1782 coding->composition_rule_follows = 0; \
1786 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
1789 decode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
1790 struct coding_system
*coding
;
1791 unsigned char *source
, *destination
;
1792 int src_bytes
, dst_bytes
;
1794 unsigned char *src
= source
;
1795 unsigned char *src_end
= source
+ src_bytes
;
1796 unsigned char *dst
= destination
;
1797 unsigned char *dst_end
= destination
+ dst_bytes
;
1798 /* Charsets invoked to graphic plane 0 and 1 respectively. */
1799 int charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1800 int charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
1801 /* SRC_BASE remembers the start position in source in each loop.
1802 The loop will be exited when there's not enough source code
1803 (within macro ONE_MORE_BYTE), or when there's not enough
1804 destination area to produce a character (within macro
1806 unsigned char *src_base
;
1808 Lisp_Object translation_table
;
1809 Lisp_Object safe_chars
;
1811 safe_chars
= coding_safe_chars (coding
->symbol
);
1813 if (NILP (Venable_character_translation
))
1814 translation_table
= Qnil
;
1817 translation_table
= coding
->translation_table_for_decode
;
1818 if (NILP (translation_table
))
1819 translation_table
= Vstandard_translation_table_for_decode
;
1822 coding
->result
= CODING_FINISH_NORMAL
;
1831 /* We produce no character or one character. */
1832 switch (iso_code_class
[c1
])
1834 case ISO_0x20_or_0x7F
:
1835 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1837 DECODE_COMPOSITION_RULE (c1
);
1840 if (charset0
< 0 || CHARSET_CHARS (charset0
) == 94)
1842 /* This is SPACE or DEL. */
1843 charset
= CHARSET_ASCII
;
1846 /* This is a graphic character, we fall down ... */
1848 case ISO_graphic_plane_0
:
1849 if (COMPOSING_P (coding
) && coding
->composition_rule_follows
)
1851 DECODE_COMPOSITION_RULE (c1
);
1857 case ISO_0xA0_or_0xFF
:
1858 if (charset1
< 0 || CHARSET_CHARS (charset1
) == 94
1859 || coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
1860 goto label_invalid_code
;
1861 /* This is a graphic character, we fall down ... */
1863 case ISO_graphic_plane_1
:
1865 goto label_invalid_code
;
1870 if (COMPOSING_P (coding
))
1871 DECODE_COMPOSITION_END ('1');
1873 /* All ISO2022 control characters in this class have the
1874 same representation in Emacs internal format. */
1876 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
1877 && (coding
->eol_type
== CODING_EOL_CR
1878 || coding
->eol_type
== CODING_EOL_CRLF
))
1880 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
1881 goto label_end_of_loop
;
1883 charset
= CHARSET_ASCII
;
1887 if (COMPOSING_P (coding
))
1888 DECODE_COMPOSITION_END ('1');
1889 goto label_invalid_code
;
1891 case ISO_carriage_return
:
1892 if (COMPOSING_P (coding
))
1893 DECODE_COMPOSITION_END ('1');
1895 if (coding
->eol_type
== CODING_EOL_CR
)
1897 else if (coding
->eol_type
== CODING_EOL_CRLF
)
1900 if (c1
!= ISO_CODE_LF
)
1906 charset
= CHARSET_ASCII
;
1910 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1911 || CODING_SPEC_ISO_DESIGNATION (coding
, 1) < 0)
1912 goto label_invalid_code
;
1913 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 1;
1914 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1918 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
1919 goto label_invalid_code
;
1920 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
1921 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1924 case ISO_single_shift_2_7
:
1925 case ISO_single_shift_2
:
1926 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1927 goto label_invalid_code
;
1928 /* SS2 is handled as an escape sequence of ESC 'N' */
1930 goto label_escape_sequence
;
1932 case ISO_single_shift_3
:
1933 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
1934 goto label_invalid_code
;
1935 /* SS2 is handled as an escape sequence of ESC 'O' */
1937 goto label_escape_sequence
;
1939 case ISO_control_sequence_introducer
:
1940 /* CSI is handled as an escape sequence of ESC '[' ... */
1942 goto label_escape_sequence
;
1946 label_escape_sequence
:
1947 /* Escape sequences handled by Emacs are invocation,
1948 designation, direction specification, and character
1949 composition specification. */
1952 case '&': /* revision of following character set */
1954 if (!(c1
>= '@' && c1
<= '~'))
1955 goto label_invalid_code
;
1957 if (c1
!= ISO_CODE_ESC
)
1958 goto label_invalid_code
;
1960 goto label_escape_sequence
;
1962 case '$': /* designation of 2-byte character set */
1963 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
1964 goto label_invalid_code
;
1966 if (c1
>= '@' && c1
<= 'B')
1967 { /* designation of JISX0208.1978, GB2312.1980,
1969 DECODE_DESIGNATION (0, 2, 94, c1
);
1971 else if (c1
>= 0x28 && c1
<= 0x2B)
1972 { /* designation of DIMENSION2_CHARS94 character set */
1974 DECODE_DESIGNATION (c1
- 0x28, 2, 94, c2
);
1976 else if (c1
>= 0x2C && c1
<= 0x2F)
1977 { /* designation of DIMENSION2_CHARS96 character set */
1979 DECODE_DESIGNATION (c1
- 0x2C, 2, 96, c2
);
1982 goto label_invalid_code
;
1983 /* We must update these variables now. */
1984 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1985 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
1988 case 'n': /* invocation of locking-shift-2 */
1989 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1990 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
1991 goto label_invalid_code
;
1992 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 2;
1993 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
1996 case 'o': /* invocation of locking-shift-3 */
1997 if (! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
)
1998 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
1999 goto label_invalid_code
;
2000 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 3;
2001 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2004 case 'N': /* invocation of single-shift-2 */
2005 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2006 || CODING_SPEC_ISO_DESIGNATION (coding
, 2) < 0)
2007 goto label_invalid_code
;
2008 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 2);
2010 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
2011 goto label_invalid_code
;
2014 case 'O': /* invocation of single-shift-3 */
2015 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2016 || CODING_SPEC_ISO_DESIGNATION (coding
, 3) < 0)
2017 goto label_invalid_code
;
2018 charset
= CODING_SPEC_ISO_DESIGNATION (coding
, 3);
2020 if (c1
< 0x20 || (c1
>= 0x80 && c1
< 0xA0))
2021 goto label_invalid_code
;
2024 case '0': case '2': case '3': case '4': /* start composition */
2025 DECODE_COMPOSITION_START (c1
);
2028 case '1': /* end composition */
2029 DECODE_COMPOSITION_END (c1
);
2032 case '[': /* specification of direction */
2033 if (coding
->flags
& CODING_FLAG_ISO_NO_DIRECTION
)
2034 goto label_invalid_code
;
2035 /* For the moment, nested direction is not supported.
2036 So, `coding->mode & CODING_MODE_DIRECTION' zero means
2037 left-to-right, and nonzero means right-to-left. */
2041 case ']': /* end of the current direction */
2042 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2044 case '0': /* end of the current direction */
2045 case '1': /* start of left-to-right direction */
2048 coding
->mode
&= ~CODING_MODE_DIRECTION
;
2050 goto label_invalid_code
;
2053 case '2': /* start of right-to-left direction */
2056 coding
->mode
|= CODING_MODE_DIRECTION
;
2058 goto label_invalid_code
;
2062 goto label_invalid_code
;
2067 if (COMPOSING_P (coding
))
2068 DECODE_COMPOSITION_END ('1');
2072 /* CTEXT extended segment:
2073 ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES--
2074 We keep these bytes as is for the moment.
2075 They may be decoded by post-read-conversion. */
2080 ONE_MORE_BYTE (dim
);
2083 size
= ((M
- 128) * 128) + (L
- 128);
2084 required
= 8 + size
* 2;
2085 if (dst
+ required
> (dst_bytes
? dst_end
: src
))
2086 goto label_end_of_loop
;
2087 *dst
++ = ISO_CODE_ESC
;
2092 dst
+= CHAR_STRING (M
, dst
), produced_chars
++;
2093 dst
+= CHAR_STRING (L
, dst
), produced_chars
++;
2097 dst
+= CHAR_STRING (c1
, dst
), produced_chars
++;
2099 coding
->produced_char
+= produced_chars
;
2103 unsigned char *d
= dst
;
2106 /* XFree86 extension for embedding UTF-8 in CTEXT:
2107 ESC % G --UTF-8-BYTES-- ESC % @
2108 We keep these bytes as is for the moment.
2109 They may be decoded by post-read-conversion. */
2110 if (d
+ 6 > (dst_bytes
? dst_end
: src
))
2111 goto label_end_of_loop
;
2112 *d
++ = ISO_CODE_ESC
;
2116 while (d
+ 1 < (dst_bytes
? dst_end
: src
))
2119 if (c1
== ISO_CODE_ESC
2120 && src
+ 1 < src_end
2124 d
+= CHAR_STRING (c1
, d
), produced_chars
++;
2126 if (d
+ 3 > (dst_bytes
? dst_end
: src
))
2127 goto label_end_of_loop
;
2128 *d
++ = ISO_CODE_ESC
;
2132 coding
->produced_char
+= produced_chars
+ 3;
2135 goto label_invalid_code
;
2139 if (! (coding
->flags
& CODING_FLAG_ISO_DESIGNATION
))
2140 goto label_invalid_code
;
2141 if (c1
>= 0x28 && c1
<= 0x2B)
2142 { /* designation of DIMENSION1_CHARS94 character set */
2144 DECODE_DESIGNATION (c1
- 0x28, 1, 94, c2
);
2146 else if (c1
>= 0x2C && c1
<= 0x2F)
2147 { /* designation of DIMENSION1_CHARS96 character set */
2149 DECODE_DESIGNATION (c1
- 0x2C, 1, 96, c2
);
2152 goto label_invalid_code
;
2153 /* We must update these variables now. */
2154 charset0
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 0);
2155 charset1
= CODING_SPEC_ISO_PLANE_CHARSET (coding
, 1);
2160 /* Now we know CHARSET and 1st position code C1 of a character.
2161 Produce a multibyte sequence for that character while getting
2162 2nd position code C2 if necessary. */
2163 if (CHARSET_DIMENSION (charset
) == 2)
2166 if (c1
< 0x80 ? c2
< 0x20 || c2
>= 0x80 : c2
< 0xA0)
2167 /* C2 is not in a valid range. */
2168 goto label_invalid_code
;
2170 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
2176 if (COMPOSING_P (coding
))
2177 DECODE_COMPOSITION_END ('1');
2184 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
2185 coding
->produced
= dst
- destination
;
2190 /* ISO2022 encoding stuff. */
2193 It is not enough to say just "ISO2022" on encoding, we have to
2194 specify more details. In Emacs, each ISO2022 coding system
2195 variant has the following specifications:
2196 1. Initial designation to G0 through G3.
2197 2. Allows short-form designation?
2198 3. ASCII should be designated to G0 before control characters?
2199 4. ASCII should be designated to G0 at end of line?
2200 5. 7-bit environment or 8-bit environment?
2201 6. Use locking-shift?
2202 7. Use Single-shift?
2203 And the following two are only for Japanese:
2204 8. Use ASCII in place of JIS0201-1976-Roman?
2205 9. Use JISX0208-1983 in place of JISX0208-1978?
2206 These specifications are encoded in `coding->flags' as flag bits
2207 defined by macros CODING_FLAG_ISO_XXX. See `coding.h' for more
2211 /* Produce codes (escape sequence) for designating CHARSET to graphic
2212 register REG at DST, and increment DST. If <final-char> of CHARSET is
2213 '@', 'A', or 'B' and the coding system CODING allows, produce
2214 designation sequence of short-form. */
2216 #define ENCODE_DESIGNATION(charset, reg, coding) \
2218 unsigned char final_char = CHARSET_ISO_FINAL_CHAR (charset); \
2219 char *intermediate_char_94 = "()*+"; \
2220 char *intermediate_char_96 = ",-./"; \
2221 int revision = CODING_SPEC_ISO_REVISION_NUMBER(coding, charset); \
2223 if (revision < 255) \
2225 *dst++ = ISO_CODE_ESC; \
2227 *dst++ = '@' + revision; \
2229 *dst++ = ISO_CODE_ESC; \
2230 if (CHARSET_DIMENSION (charset) == 1) \
2232 if (CHARSET_CHARS (charset) == 94) \
2233 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2235 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2240 if (CHARSET_CHARS (charset) == 94) \
2242 if (! (coding->flags & CODING_FLAG_ISO_SHORT_FORM) \
2244 || final_char < '@' || final_char > 'B') \
2245 *dst++ = (unsigned char) (intermediate_char_94[reg]); \
2248 *dst++ = (unsigned char) (intermediate_char_96[reg]); \
2250 *dst++ = final_char; \
2251 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
2254 /* The following two macros produce codes (control character or escape
2255 sequence) for ISO2022 single-shift functions (single-shift-2 and
2258 #define ENCODE_SINGLE_SHIFT_2 \
2260 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2261 *dst++ = ISO_CODE_ESC, *dst++ = 'N'; \
2263 *dst++ = ISO_CODE_SS2; \
2264 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2267 #define ENCODE_SINGLE_SHIFT_3 \
2269 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2270 *dst++ = ISO_CODE_ESC, *dst++ = 'O'; \
2272 *dst++ = ISO_CODE_SS3; \
2273 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 1; \
2276 /* The following four macros produce codes (control character or
2277 escape sequence) for ISO2022 locking-shift functions (shift-in,
2278 shift-out, locking-shift-2, and locking-shift-3). */
2280 #define ENCODE_SHIFT_IN \
2282 *dst++ = ISO_CODE_SI; \
2283 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; \
2286 #define ENCODE_SHIFT_OUT \
2288 *dst++ = ISO_CODE_SO; \
2289 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; \
2292 #define ENCODE_LOCKING_SHIFT_2 \
2294 *dst++ = ISO_CODE_ESC, *dst++ = 'n'; \
2295 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; \
2298 #define ENCODE_LOCKING_SHIFT_3 \
2300 *dst++ = ISO_CODE_ESC, *dst++ = 'o'; \
2301 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; \
2304 /* Produce codes for a DIMENSION1 character whose character set is
2305 CHARSET and whose position-code is C1. Designation and invocation
2306 sequences are also produced in advance if necessary. */
2308 #define ENCODE_ISO_CHARACTER_DIMENSION1(charset, c1) \
2310 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2312 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2313 *dst++ = c1 & 0x7F; \
2315 *dst++ = c1 | 0x80; \
2316 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2319 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2321 *dst++ = c1 & 0x7F; \
2324 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2326 *dst++ = c1 | 0x80; \
2330 /* Since CHARSET is not yet invoked to any graphic planes, we \
2331 must invoke it, or, at first, designate it to some graphic \
2332 register. Then repeat the loop to actually produce the \
2334 dst = encode_invocation_designation (charset, coding, dst); \
2337 /* Produce codes for a DIMENSION2 character whose character set is
2338 CHARSET and whose position-codes are C1 and C2. Designation and
2339 invocation codes are also produced in advance if necessary. */
2341 #define ENCODE_ISO_CHARACTER_DIMENSION2(charset, c1, c2) \
2343 if (CODING_SPEC_ISO_SINGLE_SHIFTING (coding)) \
2345 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) \
2346 *dst++ = c1 & 0x7F, *dst++ = c2 & 0x7F; \
2348 *dst++ = c1 | 0x80, *dst++ = c2 | 0x80; \
2349 CODING_SPEC_ISO_SINGLE_SHIFTING (coding) = 0; \
2352 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 0)) \
2354 *dst++ = c1 & 0x7F, *dst++= c2 & 0x7F; \
2357 else if (charset == CODING_SPEC_ISO_PLANE_CHARSET (coding, 1)) \
2359 *dst++ = c1 | 0x80, *dst++= c2 | 0x80; \
2363 /* Since CHARSET is not yet invoked to any graphic planes, we \
2364 must invoke it, or, at first, designate it to some graphic \
2365 register. Then repeat the loop to actually produce the \
2367 dst = encode_invocation_designation (charset, coding, dst); \
2370 #define ENCODE_ISO_CHARACTER(c) \
2372 int charset, c1, c2; \
2374 SPLIT_CHAR (c, charset, c1, c2); \
2375 if (CHARSET_DEFINED_P (charset)) \
2377 if (CHARSET_DIMENSION (charset) == 1) \
2379 if (charset == CHARSET_ASCII \
2380 && coding->flags & CODING_FLAG_ISO_USE_ROMAN) \
2381 charset = charset_latin_jisx0201; \
2382 ENCODE_ISO_CHARACTER_DIMENSION1 (charset, c1); \
2386 if (charset == charset_jisx0208 \
2387 && coding->flags & CODING_FLAG_ISO_USE_OLDJIS) \
2388 charset = charset_jisx0208_1978; \
2389 ENCODE_ISO_CHARACTER_DIMENSION2 (charset, c1, c2); \
2401 /* Instead of encoding character C, produce one or two `?'s. */
2403 #define ENCODE_UNSAFE_CHARACTER(c) \
2405 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
2406 if (CHARSET_WIDTH (CHAR_CHARSET (c)) > 1) \
2407 ENCODE_ISO_CHARACTER (CODING_REPLACEMENT_CHARACTER); \
2411 /* Produce designation and invocation codes at a place pointed by DST
2412 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
2416 encode_invocation_designation (charset
, coding
, dst
)
2418 struct coding_system
*coding
;
2421 int reg
; /* graphic register number */
2423 /* At first, check designations. */
2424 for (reg
= 0; reg
< 4; reg
++)
2425 if (charset
== CODING_SPEC_ISO_DESIGNATION (coding
, reg
))
2430 /* CHARSET is not yet designated to any graphic registers. */
2431 /* At first check the requested designation. */
2432 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2433 if (reg
== CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
)
2434 /* Since CHARSET requests no special designation, designate it
2435 to graphic register 0. */
2438 ENCODE_DESIGNATION (charset
, reg
, coding
);
2441 if (CODING_SPEC_ISO_INVOCATION (coding
, 0) != reg
2442 && CODING_SPEC_ISO_INVOCATION (coding
, 1) != reg
)
2444 /* Since the graphic register REG is not invoked to any graphic
2445 planes, invoke it to graphic plane 0. */
2448 case 0: /* graphic register 0 */
2452 case 1: /* graphic register 1 */
2456 case 2: /* graphic register 2 */
2457 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2458 ENCODE_SINGLE_SHIFT_2
;
2460 ENCODE_LOCKING_SHIFT_2
;
2463 case 3: /* graphic register 3 */
2464 if (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
)
2465 ENCODE_SINGLE_SHIFT_3
;
2467 ENCODE_LOCKING_SHIFT_3
;
2475 /* Produce 2-byte codes for encoded composition rule RULE. */
2477 #define ENCODE_COMPOSITION_RULE(rule) \
2480 COMPOSITION_DECODE_RULE (rule, gref, nref); \
2481 *dst++ = 32 + 81 + gref; \
2482 *dst++ = 32 + nref; \
2485 /* Produce codes for indicating the start of a composition sequence
2486 (ESC 0, ESC 3, or ESC 4). DATA points to an array of integers
2487 which specify information about the composition. See the comment
2488 in coding.h for the format of DATA. */
2490 #define ENCODE_COMPOSITION_START(coding, data) \
2492 coding->composing = data[3]; \
2493 *dst++ = ISO_CODE_ESC; \
2494 if (coding->composing == COMPOSITION_RELATIVE) \
2498 *dst++ = (coding->composing == COMPOSITION_WITH_ALTCHARS \
2500 coding->cmp_data_index = coding->cmp_data_start + 4; \
2501 coding->composition_rule_follows = 0; \
2505 /* Produce codes for indicating the end of the current composition. */
2507 #define ENCODE_COMPOSITION_END(coding, data) \
2509 *dst++ = ISO_CODE_ESC; \
2511 coding->cmp_data_start += data[0]; \
2512 coding->composing = COMPOSITION_NO; \
2513 if (coding->cmp_data_start == coding->cmp_data->used \
2514 && coding->cmp_data->next) \
2516 coding->cmp_data = coding->cmp_data->next; \
2517 coding->cmp_data_start = 0; \
2521 /* Produce composition start sequence ESC 0. Here, this sequence
2522 doesn't mean the start of a new composition but means that we have
2523 just produced components (alternate chars and composition rules) of
2524 the composition and the actual text follows in SRC. */
2526 #define ENCODE_COMPOSITION_FAKE_START(coding) \
2528 *dst++ = ISO_CODE_ESC; \
2530 coding->composing = COMPOSITION_RELATIVE; \
2533 /* The following three macros produce codes for indicating direction
2535 #define ENCODE_CONTROL_SEQUENCE_INTRODUCER \
2537 if (coding->flags == CODING_FLAG_ISO_SEVEN_BITS) \
2538 *dst++ = ISO_CODE_ESC, *dst++ = '['; \
2540 *dst++ = ISO_CODE_CSI; \
2543 #define ENCODE_DIRECTION_R2L \
2544 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '2', *dst++ = ']'
2546 #define ENCODE_DIRECTION_L2R \
2547 ENCODE_CONTROL_SEQUENCE_INTRODUCER (dst), *dst++ = '0', *dst++ = ']'
2549 /* Produce codes for designation and invocation to reset the graphic
2550 planes and registers to initial state. */
2551 #define ENCODE_RESET_PLANE_AND_REGISTER \
2554 if (CODING_SPEC_ISO_INVOCATION (coding, 0) != 0) \
2556 for (reg = 0; reg < 4; reg++) \
2557 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg) >= 0 \
2558 && (CODING_SPEC_ISO_DESIGNATION (coding, reg) \
2559 != CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg))) \
2560 ENCODE_DESIGNATION \
2561 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
2564 /* Produce designation sequences of charsets in the line started from
2565 SRC to a place pointed by DST, and return updated DST.
2567 If the current block ends before any end-of-line, we may fail to
2568 find all the necessary designations. */
2570 static unsigned char *
2571 encode_designation_at_bol (coding
, translation_table
, src
, src_end
, dst
)
2572 struct coding_system
*coding
;
2573 Lisp_Object translation_table
;
2574 unsigned char *src
, *src_end
, *dst
;
2576 int charset
, c
, found
= 0, reg
;
2577 /* Table of charsets to be designated to each graphic register. */
2580 for (reg
= 0; reg
< 4; reg
++)
2589 charset
= CHAR_CHARSET (c
);
2590 reg
= CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
);
2591 if (reg
!= CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
&& r
[reg
] < 0)
2601 for (reg
= 0; reg
< 4; reg
++)
2603 && CODING_SPEC_ISO_DESIGNATION (coding
, reg
) != r
[reg
])
2604 ENCODE_DESIGNATION (r
[reg
], reg
, coding
);
2610 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
2613 encode_coding_iso2022 (coding
, source
, destination
, src_bytes
, dst_bytes
)
2614 struct coding_system
*coding
;
2615 unsigned char *source
, *destination
;
2616 int src_bytes
, dst_bytes
;
2618 unsigned char *src
= source
;
2619 unsigned char *src_end
= source
+ src_bytes
;
2620 unsigned char *dst
= destination
;
2621 unsigned char *dst_end
= destination
+ dst_bytes
;
2622 /* Since the maximum bytes produced by each loop is 20, we subtract 19
2623 from DST_END to assure overflow checking is necessary only at the
2625 unsigned char *adjusted_dst_end
= dst_end
- 19;
2626 /* SRC_BASE remembers the start position in source in each loop.
2627 The loop will be exited when there's not enough source text to
2628 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
2629 there's not enough destination area to produce encoded codes
2630 (within macro EMIT_BYTES). */
2631 unsigned char *src_base
;
2633 Lisp_Object translation_table
;
2634 Lisp_Object safe_chars
;
2636 if (coding
->flags
& CODING_FLAG_ISO_SAFE
)
2637 coding
->mode
|= CODING_MODE_INHIBIT_UNENCODABLE_CHAR
;
2639 safe_chars
= coding_safe_chars (coding
->symbol
);
2641 if (NILP (Venable_character_translation
))
2642 translation_table
= Qnil
;
2645 translation_table
= coding
->translation_table_for_encode
;
2646 if (NILP (translation_table
))
2647 translation_table
= Vstandard_translation_table_for_encode
;
2650 coding
->consumed_char
= 0;
2656 if (dst
>= (dst_bytes
? adjusted_dst_end
: (src
- 19)))
2658 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
2662 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
2663 && CODING_SPEC_ISO_BOL (coding
))
2665 /* We have to produce designation sequences if any now. */
2666 dst
= encode_designation_at_bol (coding
, translation_table
,
2668 CODING_SPEC_ISO_BOL (coding
) = 0;
2671 /* Check composition start and end. */
2672 if (coding
->composing
!= COMPOSITION_DISABLED
2673 && coding
->cmp_data_start
< coding
->cmp_data
->used
)
2675 struct composition_data
*cmp_data
= coding
->cmp_data
;
2676 int *data
= cmp_data
->data
+ coding
->cmp_data_start
;
2677 int this_pos
= cmp_data
->char_offset
+ coding
->consumed_char
;
2679 if (coding
->composing
== COMPOSITION_RELATIVE
)
2681 if (this_pos
== data
[2])
2683 ENCODE_COMPOSITION_END (coding
, data
);
2684 cmp_data
= coding
->cmp_data
;
2685 data
= cmp_data
->data
+ coding
->cmp_data_start
;
2688 else if (COMPOSING_P (coding
))
2690 /* COMPOSITION_WITH_ALTCHARS or COMPOSITION_WITH_RULE_ALTCHAR */
2691 if (coding
->cmp_data_index
== coding
->cmp_data_start
+ data
[0])
2692 /* We have consumed components of the composition.
2693 What follows in SRC is the composition's base
2695 ENCODE_COMPOSITION_FAKE_START (coding
);
2698 int c
= cmp_data
->data
[coding
->cmp_data_index
++];
2699 if (coding
->composition_rule_follows
)
2701 ENCODE_COMPOSITION_RULE (c
);
2702 coding
->composition_rule_follows
= 0;
2706 if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
2707 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2708 ENCODE_UNSAFE_CHARACTER (c
);
2710 ENCODE_ISO_CHARACTER (c
);
2711 if (coding
->composing
== COMPOSITION_WITH_RULE_ALTCHARS
)
2712 coding
->composition_rule_follows
= 1;
2717 if (!COMPOSING_P (coding
))
2719 if (this_pos
== data
[1])
2721 ENCODE_COMPOSITION_START (coding
, data
);
2729 /* Now encode the character C. */
2730 if (c
< 0x20 || c
== 0x7F)
2734 if (! (coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
2736 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2737 ENCODE_RESET_PLANE_AND_REGISTER
;
2741 /* fall down to treat '\r' as '\n' ... */
2746 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_EOL
)
2747 ENCODE_RESET_PLANE_AND_REGISTER
;
2748 if (coding
->flags
& CODING_FLAG_ISO_INIT_AT_BOL
)
2749 bcopy (coding
->spec
.iso2022
.initial_designation
,
2750 coding
->spec
.iso2022
.current_designation
,
2751 sizeof coding
->spec
.iso2022
.initial_designation
);
2752 if (coding
->eol_type
== CODING_EOL_LF
2753 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
2754 *dst
++ = ISO_CODE_LF
;
2755 else if (coding
->eol_type
== CODING_EOL_CRLF
)
2756 *dst
++ = ISO_CODE_CR
, *dst
++ = ISO_CODE_LF
;
2758 *dst
++ = ISO_CODE_CR
;
2759 CODING_SPEC_ISO_BOL (coding
) = 1;
2763 if (coding
->flags
& CODING_FLAG_ISO_RESET_AT_CNTL
)
2764 ENCODE_RESET_PLANE_AND_REGISTER
;
2768 else if (ASCII_BYTE_P (c
))
2769 ENCODE_ISO_CHARACTER (c
);
2770 else if (SINGLE_BYTE_CHAR_P (c
))
2775 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
2776 && ! CODING_SAFE_CHAR_P (safe_chars
, c
))
2777 ENCODE_UNSAFE_CHARACTER (c
);
2779 ENCODE_ISO_CHARACTER (c
);
2781 coding
->consumed_char
++;
2785 coding
->consumed
= src_base
- source
;
2786 coding
->produced
= coding
->produced_char
= dst
- destination
;
2790 /*** 4. SJIS and BIG5 handlers ***/
2792 /* Although SJIS and BIG5 are not ISO coding systems, they are used
2793 quite widely. So, for the moment, Emacs supports them in the bare
2794 C code. But, in the future, they may be supported only by CCL. */
2796 /* SJIS is a coding system encoding three character sets: ASCII, right
2797 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2798 as is. A character of charset katakana-jisx0201 is encoded by
2799 "position-code + 0x80". A character of charset japanese-jisx0208
2800 is encoded in 2-byte but two position-codes are divided and shifted
2801 so that it fits in the range below.
2803 --- CODE RANGE of SJIS ---
2804 (character set) (range)
2806 KATAKANA-JISX0201 0xA1 .. 0xDF
2807 JISX0208 (1st byte) 0x81 .. 0x9F and 0xE0 .. 0xEF
2808 (2nd byte) 0x40 .. 0x7E and 0x80 .. 0xFC
2809 -------------------------------
2813 /* BIG5 is a coding system encoding two character sets: ASCII and
2814 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2815 character set and is encoded in two bytes.
2817 --- CODE RANGE of BIG5 ---
2818 (character set) (range)
2820 Big5 (1st byte) 0xA1 .. 0xFE
2821 (2nd byte) 0x40 .. 0x7E and 0xA1 .. 0xFE
2822 --------------------------
2824 Since the number of characters in Big5 is larger than maximum
2825 characters in Emacs' charset (96x96), it can't be handled as one
2826 charset. So, in Emacs, Big5 is divided into two: `charset-big5-1'
2827 and `charset-big5-2'. Both are DIMENSION2 and CHARS94. The former
2828 contains frequently used characters and the latter contains less
2829 frequently used characters. */
2831 /* Macros to decode or encode a character of Big5 in BIG5. B1 and B2
2832 are the 1st and 2nd position-codes of Big5 in BIG5 coding system.
2833 C1 and C2 are the 1st and 2nd position-codes of Emacs' internal
2834 format. CHARSET is `charset_big5_1' or `charset_big5_2'. */
2836 /* Number of Big5 characters which have the same code in 1st byte. */
2837 #define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
2839 #define DECODE_BIG5(b1, b2, charset, c1, c2) \
2842 = (b1 - 0xA1) * BIG5_SAME_ROW + b2 - (b2 < 0x7F ? 0x40 : 0x62); \
2844 charset = charset_big5_1; \
2847 charset = charset_big5_2; \
2848 temp -= (0xC9 - 0xA1) * BIG5_SAME_ROW; \
2850 c1 = temp / (0xFF - 0xA1) + 0x21; \
2851 c2 = temp % (0xFF - 0xA1) + 0x21; \
2854 #define ENCODE_BIG5(charset, c1, c2, b1, b2) \
2856 unsigned int temp = (c1 - 0x21) * (0xFF - 0xA1) + (c2 - 0x21); \
2857 if (charset == charset_big5_2) \
2858 temp += BIG5_SAME_ROW * (0xC9 - 0xA1); \
2859 b1 = temp / BIG5_SAME_ROW + 0xA1; \
2860 b2 = temp % BIG5_SAME_ROW; \
2861 b2 += b2 < 0x3F ? 0x40 : 0x62; \
2864 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2865 Check if a text is encoded in SJIS. If it is, return
2866 CODING_CATEGORY_MASK_SJIS, else return 0. */
2869 detect_coding_sjis (src
, src_end
, multibytep
)
2870 unsigned char *src
, *src_end
;
2874 /* Dummy for ONE_MORE_BYTE. */
2875 struct coding_system dummy_coding
;
2876 struct coding_system
*coding
= &dummy_coding
;
2880 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2883 if (c
== 0x80 || c
== 0xA0 || c
> 0xEF)
2885 if (c
<= 0x9F || c
>= 0xE0)
2887 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2888 if (c
< 0x40 || c
== 0x7F || c
> 0xFC)
2893 return CODING_CATEGORY_MASK_SJIS
;
2896 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2897 Check if a text is encoded in BIG5. If it is, return
2898 CODING_CATEGORY_MASK_BIG5, else return 0. */
2901 detect_coding_big5 (src
, src_end
, multibytep
)
2902 unsigned char *src
, *src_end
;
2906 /* Dummy for ONE_MORE_BYTE. */
2907 struct coding_system dummy_coding
;
2908 struct coding_system
*coding
= &dummy_coding
;
2912 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2915 if (c
< 0xA1 || c
> 0xFE)
2917 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2918 if (c
< 0x40 || (c
> 0x7F && c
< 0xA1) || c
> 0xFE)
2922 return CODING_CATEGORY_MASK_BIG5
;
2925 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2926 Check if a text is encoded in UTF-8. If it is, return
2927 CODING_CATEGORY_MASK_UTF_8, else return 0. */
2929 #define UTF_8_1_OCTET_P(c) ((c) < 0x80)
2930 #define UTF_8_EXTRA_OCTET_P(c) (((c) & 0xC0) == 0x80)
2931 #define UTF_8_2_OCTET_LEADING_P(c) (((c) & 0xE0) == 0xC0)
2932 #define UTF_8_3_OCTET_LEADING_P(c) (((c) & 0xF0) == 0xE0)
2933 #define UTF_8_4_OCTET_LEADING_P(c) (((c) & 0xF8) == 0xF0)
2934 #define UTF_8_5_OCTET_LEADING_P(c) (((c) & 0xFC) == 0xF8)
2935 #define UTF_8_6_OCTET_LEADING_P(c) (((c) & 0xFE) == 0xFC)
2938 detect_coding_utf_8 (src
, src_end
, multibytep
)
2939 unsigned char *src
, *src_end
;
2943 int seq_maybe_bytes
;
2944 /* Dummy for ONE_MORE_BYTE. */
2945 struct coding_system dummy_coding
;
2946 struct coding_system
*coding
= &dummy_coding
;
2950 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2951 if (UTF_8_1_OCTET_P (c
))
2953 else if (UTF_8_2_OCTET_LEADING_P (c
))
2954 seq_maybe_bytes
= 1;
2955 else if (UTF_8_3_OCTET_LEADING_P (c
))
2956 seq_maybe_bytes
= 2;
2957 else if (UTF_8_4_OCTET_LEADING_P (c
))
2958 seq_maybe_bytes
= 3;
2959 else if (UTF_8_5_OCTET_LEADING_P (c
))
2960 seq_maybe_bytes
= 4;
2961 else if (UTF_8_6_OCTET_LEADING_P (c
))
2962 seq_maybe_bytes
= 5;
2968 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
2969 if (!UTF_8_EXTRA_OCTET_P (c
))
2973 while (seq_maybe_bytes
> 0);
2977 return CODING_CATEGORY_MASK_UTF_8
;
2980 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
2981 Check if a text is encoded in UTF-16 Big Endian (endian == 1) or
2982 Little Endian (otherwise). If it is, return
2983 CODING_CATEGORY_MASK_UTF_16_BE or CODING_CATEGORY_MASK_UTF_16_LE,
2986 #define UTF_16_INVALID_P(val) \
2987 (((val) == 0xFFFE) \
2988 || ((val) == 0xFFFF))
2990 #define UTF_16_HIGH_SURROGATE_P(val) \
2991 (((val) & 0xD800) == 0xD800)
2993 #define UTF_16_LOW_SURROGATE_P(val) \
2994 (((val) & 0xDC00) == 0xDC00)
2997 detect_coding_utf_16 (src
, src_end
, multibytep
)
2998 unsigned char *src
, *src_end
;
3001 unsigned char c1
, c2
;
3002 /* Dummy for ONE_MORE_BYTE_CHECK_MULTIBYTE. */
3003 struct coding_system dummy_coding
;
3004 struct coding_system
*coding
= &dummy_coding
;
3006 ONE_MORE_BYTE_CHECK_MULTIBYTE (c1
, multibytep
);
3007 ONE_MORE_BYTE_CHECK_MULTIBYTE (c2
, multibytep
);
3009 if ((c1
== 0xFF) && (c2
== 0xFE))
3010 return CODING_CATEGORY_MASK_UTF_16_LE
;
3011 else if ((c1
== 0xFE) && (c2
== 0xFF))
3012 return CODING_CATEGORY_MASK_UTF_16_BE
;
3018 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
3019 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
3022 decode_coding_sjis_big5 (coding
, source
, destination
,
3023 src_bytes
, dst_bytes
, sjis_p
)
3024 struct coding_system
*coding
;
3025 unsigned char *source
, *destination
;
3026 int src_bytes
, dst_bytes
;
3029 unsigned char *src
= source
;
3030 unsigned char *src_end
= source
+ src_bytes
;
3031 unsigned char *dst
= destination
;
3032 unsigned char *dst_end
= destination
+ dst_bytes
;
3033 /* SRC_BASE remembers the start position in source in each loop.
3034 The loop will be exited when there's not enough source code
3035 (within macro ONE_MORE_BYTE), or when there's not enough
3036 destination area to produce a character (within macro
3038 unsigned char *src_base
;
3039 Lisp_Object translation_table
;
3041 if (NILP (Venable_character_translation
))
3042 translation_table
= Qnil
;
3045 translation_table
= coding
->translation_table_for_decode
;
3046 if (NILP (translation_table
))
3047 translation_table
= Vstandard_translation_table_for_decode
;
3050 coding
->produced_char
= 0;
3053 int c
, charset
, c1
, c2
;
3060 charset
= CHARSET_ASCII
;
3065 if (coding
->eol_type
== CODING_EOL_CRLF
)
3071 /* To process C2 again, SRC is subtracted by 1. */
3074 else if (coding
->eol_type
== CODING_EOL_CR
)
3078 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
3079 && (coding
->eol_type
== CODING_EOL_CR
3080 || coding
->eol_type
== CODING_EOL_CRLF
))
3082 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3083 goto label_end_of_loop
;
3091 if (c1
== 0x80 || c1
== 0xA0 || c1
> 0xEF)
3092 goto label_invalid_code
;
3093 if (c1
<= 0x9F || c1
>= 0xE0)
3095 /* SJIS -> JISX0208 */
3097 if (c2
< 0x40 || c2
== 0x7F || c2
> 0xFC)
3098 goto label_invalid_code
;
3099 DECODE_SJIS (c1
, c2
, c1
, c2
);
3100 charset
= charset_jisx0208
;
3103 /* SJIS -> JISX0201-Kana */
3104 charset
= charset_katakana_jisx0201
;
3109 if (c1
< 0xA0 || c1
> 0xFE)
3110 goto label_invalid_code
;
3112 if (c2
< 0x40 || (c2
> 0x7E && c2
< 0xA1) || c2
> 0xFE)
3113 goto label_invalid_code
;
3114 DECODE_BIG5 (c1
, c2
, charset
, c1
, c2
);
3118 c
= DECODE_ISO_CHARACTER (charset
, c1
, c2
);
3130 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3131 coding
->produced
= dst
- destination
;
3135 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
3136 This function can encode charsets `ascii', `katakana-jisx0201',
3137 `japanese-jisx0208', `chinese-big5-1', and `chinese-big5-2'. We
3138 are sure that all these charsets are registered as official charset
3139 (i.e. do not have extended leading-codes). Characters of other
3140 charsets are produced without any encoding. If SJIS_P is 1, encode
3141 SJIS text, else encode BIG5 text. */
3144 encode_coding_sjis_big5 (coding
, source
, destination
,
3145 src_bytes
, dst_bytes
, sjis_p
)
3146 struct coding_system
*coding
;
3147 unsigned char *source
, *destination
;
3148 int src_bytes
, dst_bytes
;
3151 unsigned char *src
= source
;
3152 unsigned char *src_end
= source
+ src_bytes
;
3153 unsigned char *dst
= destination
;
3154 unsigned char *dst_end
= destination
+ dst_bytes
;
3155 /* SRC_BASE remembers the start position in source in each loop.
3156 The loop will be exited when there's not enough source text to
3157 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3158 there's not enough destination area to produce encoded codes
3159 (within macro EMIT_BYTES). */
3160 unsigned char *src_base
;
3161 Lisp_Object translation_table
;
3163 if (NILP (Venable_character_translation
))
3164 translation_table
= Qnil
;
3167 translation_table
= coding
->translation_table_for_encode
;
3168 if (NILP (translation_table
))
3169 translation_table
= Vstandard_translation_table_for_encode
;
3174 int c
, charset
, c1
, c2
;
3179 /* Now encode the character C. */
3180 if (SINGLE_BYTE_CHAR_P (c
))
3185 if (!(coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
))
3192 if (coding
->eol_type
== CODING_EOL_CRLF
)
3194 EMIT_TWO_BYTES ('\r', c
);
3197 else if (coding
->eol_type
== CODING_EOL_CR
)
3205 SPLIT_CHAR (c
, charset
, c1
, c2
);
3208 if (charset
== charset_jisx0208
3209 || charset
== charset_jisx0208_1978
)
3211 ENCODE_SJIS (c1
, c2
, c1
, c2
);
3212 EMIT_TWO_BYTES (c1
, c2
);
3214 else if (charset
== charset_katakana_jisx0201
)
3215 EMIT_ONE_BYTE (c1
| 0x80);
3216 else if (charset
== charset_latin_jisx0201
)
3218 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
)
3220 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3221 if (CHARSET_WIDTH (charset
) > 1)
3222 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3225 /* There's no way other than producing the internal
3227 EMIT_BYTES (src_base
, src
);
3231 if (charset
== charset_big5_1
|| charset
== charset_big5_2
)
3233 ENCODE_BIG5 (charset
, c1
, c2
, c1
, c2
);
3234 EMIT_TWO_BYTES (c1
, c2
);
3236 else if (coding
->mode
& CODING_MODE_INHIBIT_UNENCODABLE_CHAR
)
3238 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3239 if (CHARSET_WIDTH (charset
) > 1)
3240 EMIT_ONE_BYTE (CODING_REPLACEMENT_CHARACTER
);
3243 /* There's no way other than producing the internal
3245 EMIT_BYTES (src_base
, src
);
3248 coding
->consumed_char
++;
3252 coding
->consumed
= src_base
- source
;
3253 coding
->produced
= coding
->produced_char
= dst
- destination
;
3257 /*** 5. CCL handlers ***/
3259 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
3260 Check if a text is encoded in a coding system of which
3261 encoder/decoder are written in CCL program. If it is, return
3262 CODING_CATEGORY_MASK_CCL, else return 0. */
3265 detect_coding_ccl (src
, src_end
, multibytep
)
3266 unsigned char *src
, *src_end
;
3269 unsigned char *valid
;
3271 /* Dummy for ONE_MORE_BYTE. */
3272 struct coding_system dummy_coding
;
3273 struct coding_system
*coding
= &dummy_coding
;
3275 /* No coding system is assigned to coding-category-ccl. */
3276 if (!coding_system_table
[CODING_CATEGORY_IDX_CCL
])
3279 valid
= coding_system_table
[CODING_CATEGORY_IDX_CCL
]->spec
.ccl
.valid_codes
;
3282 ONE_MORE_BYTE_CHECK_MULTIBYTE (c
, multibytep
);
3287 return CODING_CATEGORY_MASK_CCL
;
3291 /*** 6. End-of-line handlers ***/
3293 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
3296 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3297 struct coding_system
*coding
;
3298 unsigned char *source
, *destination
;
3299 int src_bytes
, dst_bytes
;
3301 unsigned char *src
= source
;
3302 unsigned char *dst
= destination
;
3303 unsigned char *src_end
= src
+ src_bytes
;
3304 unsigned char *dst_end
= dst
+ dst_bytes
;
3305 Lisp_Object translation_table
;
3306 /* SRC_BASE remembers the start position in source in each loop.
3307 The loop will be exited when there's not enough source code
3308 (within macro ONE_MORE_BYTE), or when there's not enough
3309 destination area to produce a character (within macro
3311 unsigned char *src_base
;
3314 translation_table
= Qnil
;
3315 switch (coding
->eol_type
)
3317 case CODING_EOL_CRLF
:
3332 && (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
))
3334 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3335 goto label_end_of_loop
;
3348 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
3350 coding
->result
= CODING_FINISH_INCONSISTENT_EOL
;
3351 goto label_end_of_loop
;
3360 default: /* no need for EOL handling */
3370 coding
->consumed
= coding
->consumed_char
= src_base
- source
;
3371 coding
->produced
= dst
- destination
;
3375 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
3376 format of end-of-line according to `coding->eol_type'. It also
3377 convert multibyte form 8-bit characters to unibyte if
3378 CODING->src_multibyte is nonzero. If `coding->mode &
3379 CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code '\r' in source text
3380 also means end-of-line. */
3383 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
)
3384 struct coding_system
*coding
;
3385 const unsigned char *source
;
3386 unsigned char *destination
;
3387 int src_bytes
, dst_bytes
;
3389 const unsigned char *src
= source
;
3390 unsigned char *dst
= destination
;
3391 const unsigned char *src_end
= src
+ src_bytes
;
3392 unsigned char *dst_end
= dst
+ dst_bytes
;
3393 Lisp_Object translation_table
;
3394 /* SRC_BASE remembers the start position in source in each loop.
3395 The loop will be exited when there's not enough source text to
3396 analyze multi-byte codes (within macro ONE_MORE_CHAR), or when
3397 there's not enough destination area to produce encoded codes
3398 (within macro EMIT_BYTES). */
3399 const unsigned char *src_base
;
3402 int selective_display
= coding
->mode
& CODING_MODE_SELECTIVE_DISPLAY
;
3404 translation_table
= Qnil
;
3405 if (coding
->src_multibyte
3406 && *(src_end
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3410 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
3413 if (coding
->eol_type
== CODING_EOL_CRLF
)
3415 while (src
< src_end
)
3421 else if (c
== '\n' || (c
== '\r' && selective_display
))
3422 EMIT_TWO_BYTES ('\r', '\n');
3432 if (!dst_bytes
|| src_bytes
<= dst_bytes
)
3434 safe_bcopy (src
, dst
, src_bytes
);
3440 if (coding
->src_multibyte
3441 && *(src
+ dst_bytes
- 1) == LEADING_CODE_8_BIT_CONTROL
)
3443 safe_bcopy (src
, dst
, dst_bytes
);
3444 src_base
= src
+ dst_bytes
;
3445 dst
= destination
+ dst_bytes
;
3446 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
3448 if (coding
->eol_type
== CODING_EOL_CR
)
3450 for (tmp
= destination
; tmp
< dst
; tmp
++)
3451 if (*tmp
== '\n') *tmp
= '\r';
3453 else if (selective_display
)
3455 for (tmp
= destination
; tmp
< dst
; tmp
++)
3456 if (*tmp
== '\r') *tmp
= '\n';
3459 if (coding
->src_multibyte
)
3460 dst
= destination
+ str_as_unibyte (destination
, dst
- destination
);
3462 coding
->consumed
= src_base
- source
;
3463 coding
->produced
= dst
- destination
;
3464 coding
->produced_char
= coding
->produced
;
3468 /*** 7. C library functions ***/
3470 /* In Emacs Lisp, a coding system is represented by a Lisp symbol which
3471 has a property `coding-system'. The value of this property is a
3472 vector of length 5 (called the coding-vector). Among elements of
3473 this vector, the first (element[0]) and the fifth (element[4])
3474 carry important information for decoding/encoding. Before
3475 decoding/encoding, this information should be set in fields of a
3476 structure of type `coding_system'.
3478 The value of the property `coding-system' can be a symbol of another
3479 subsidiary coding-system. In that case, Emacs gets coding-vector
3482 `element[0]' contains information to be set in `coding->type'. The
3483 value and its meaning is as follows:
3485 0 -- coding_type_emacs_mule
3486 1 -- coding_type_sjis
3487 2 -- coding_type_iso2022
3488 3 -- coding_type_big5
3489 4 -- coding_type_ccl encoder/decoder written in CCL
3490 nil -- coding_type_no_conversion
3491 t -- coding_type_undecided (automatic conversion on decoding,
3492 no-conversion on encoding)
3494 `element[4]' contains information to be set in `coding->flags' and
3495 `coding->spec'. The meaning varies by `coding->type'.
3497 If `coding->type' is `coding_type_iso2022', element[4] is a vector
3498 of length 32 (of which the first 13 sub-elements are used now).
3499 Meanings of these sub-elements are:
3501 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso2022'
3502 If the value is an integer of valid charset, the charset is
3503 assumed to be designated to graphic register N initially.
3505 If the value is minus, it is a minus value of charset which
3506 reserves graphic register N, which means that the charset is
3507 not designated initially but should be designated to graphic
3508 register N just before encoding a character in that charset.
3510 If the value is nil, graphic register N is never used on
3513 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
3514 Each value takes t or nil. See the section ISO2022 of
3515 `coding.h' for more information.
3517 If `coding->type' is `coding_type_big5', element[4] is t to denote
3518 BIG5-ETen or nil to denote BIG5-HKU.
3520 If `coding->type' takes the other value, element[4] is ignored.
3522 Emacs Lisp's coding systems also carry information about format of
3523 end-of-line in a value of property `eol-type'. If the value is
3524 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
3525 means CODING_EOL_CR. If it is not integer, it should be a vector
3526 of subsidiary coding systems of which property `eol-type' has one
3527 of the above values.
3531 /* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
3532 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
3533 is setup so that no conversion is necessary and return -1, else
3537 setup_coding_system (coding_system
, coding
)
3538 Lisp_Object coding_system
;
3539 struct coding_system
*coding
;
3541 Lisp_Object coding_spec
, coding_type
, eol_type
, plist
;
3544 /* At first, zero clear all members. */
3545 bzero (coding
, sizeof (struct coding_system
));
3547 /* Initialize some fields required for all kinds of coding systems. */
3548 coding
->symbol
= coding_system
;
3549 coding
->heading_ascii
= -1;
3550 coding
->post_read_conversion
= coding
->pre_write_conversion
= Qnil
;
3551 coding
->composing
= COMPOSITION_DISABLED
;
3552 coding
->cmp_data
= NULL
;
3554 if (NILP (coding_system
))
3555 goto label_invalid_coding_system
;
3557 coding_spec
= Fget (coding_system
, Qcoding_system
);
3559 if (!VECTORP (coding_spec
)
3560 || XVECTOR (coding_spec
)->size
!= 5
3561 || !CONSP (XVECTOR (coding_spec
)->contents
[3]))
3562 goto label_invalid_coding_system
;
3564 eol_type
= inhibit_eol_conversion
? Qnil
: Fget (coding_system
, Qeol_type
);
3565 if (VECTORP (eol_type
))
3567 coding
->eol_type
= CODING_EOL_UNDECIDED
;
3568 coding
->common_flags
= CODING_REQUIRE_DETECTION_MASK
;
3570 else if (XFASTINT (eol_type
) == 1)
3572 coding
->eol_type
= CODING_EOL_CRLF
;
3573 coding
->common_flags
3574 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3576 else if (XFASTINT (eol_type
) == 2)
3578 coding
->eol_type
= CODING_EOL_CR
;
3579 coding
->common_flags
3580 = CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3583 coding
->eol_type
= CODING_EOL_LF
;
3585 coding_type
= XVECTOR (coding_spec
)->contents
[0];
3586 /* Try short cut. */
3587 if (SYMBOLP (coding_type
))
3589 if (EQ (coding_type
, Qt
))
3591 coding
->type
= coding_type_undecided
;
3592 coding
->common_flags
|= CODING_REQUIRE_DETECTION_MASK
;
3595 coding
->type
= coding_type_no_conversion
;
3596 /* Initialize this member. Any thing other than
3597 CODING_CATEGORY_IDX_UTF_16_BE and
3598 CODING_CATEGORY_IDX_UTF_16_LE are ok because they have
3599 special treatment in detect_eol. */
3600 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
3605 /* Get values of coding system properties:
3606 `post-read-conversion', `pre-write-conversion',
3607 `translation-table-for-decode', `translation-table-for-encode'. */
3608 plist
= XVECTOR (coding_spec
)->contents
[3];
3609 /* Pre & post conversion functions should be disabled if
3610 inhibit_eol_conversion is nonzero. This is the case that a code
3611 conversion function is called while those functions are running. */
3612 if (! inhibit_pre_post_conversion
)
3614 coding
->post_read_conversion
= Fplist_get (plist
, Qpost_read_conversion
);
3615 coding
->pre_write_conversion
= Fplist_get (plist
, Qpre_write_conversion
);
3617 val
= Fplist_get (plist
, Qtranslation_table_for_decode
);
3619 val
= Fget (val
, Qtranslation_table_for_decode
);
3620 coding
->translation_table_for_decode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3621 val
= Fplist_get (plist
, Qtranslation_table_for_encode
);
3623 val
= Fget (val
, Qtranslation_table_for_encode
);
3624 coding
->translation_table_for_encode
= CHAR_TABLE_P (val
) ? val
: Qnil
;
3625 val
= Fplist_get (plist
, Qcoding_category
);
3628 val
= Fget (val
, Qcoding_category_index
);
3630 coding
->category_idx
= XINT (val
);
3632 goto label_invalid_coding_system
;
3635 goto label_invalid_coding_system
;
3637 /* If the coding system has non-nil `composition' property, enable
3638 composition handling. */
3639 val
= Fplist_get (plist
, Qcomposition
);
3641 coding
->composing
= COMPOSITION_NO
;
3643 switch (XFASTINT (coding_type
))
3646 coding
->type
= coding_type_emacs_mule
;
3647 coding
->common_flags
3648 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3649 if (!NILP (coding
->post_read_conversion
))
3650 coding
->common_flags
|= CODING_REQUIRE_DECODING_MASK
;
3651 if (!NILP (coding
->pre_write_conversion
))
3652 coding
->common_flags
|= CODING_REQUIRE_ENCODING_MASK
;
3656 coding
->type
= coding_type_sjis
;
3657 coding
->common_flags
3658 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3662 coding
->type
= coding_type_iso2022
;
3663 coding
->common_flags
3664 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3666 Lisp_Object val
, temp
;
3668 int i
, charset
, reg_bits
= 0;
3670 val
= XVECTOR (coding_spec
)->contents
[4];
3672 if (!VECTORP (val
) || XVECTOR (val
)->size
!= 32)
3673 goto label_invalid_coding_system
;
3675 flags
= XVECTOR (val
)->contents
;
3677 = ((NILP (flags
[4]) ? 0 : CODING_FLAG_ISO_SHORT_FORM
)
3678 | (NILP (flags
[5]) ? 0 : CODING_FLAG_ISO_RESET_AT_EOL
)
3679 | (NILP (flags
[6]) ? 0 : CODING_FLAG_ISO_RESET_AT_CNTL
)
3680 | (NILP (flags
[7]) ? 0 : CODING_FLAG_ISO_SEVEN_BITS
)
3681 | (NILP (flags
[8]) ? 0 : CODING_FLAG_ISO_LOCKING_SHIFT
)
3682 | (NILP (flags
[9]) ? 0 : CODING_FLAG_ISO_SINGLE_SHIFT
)
3683 | (NILP (flags
[10]) ? 0 : CODING_FLAG_ISO_USE_ROMAN
)
3684 | (NILP (flags
[11]) ? 0 : CODING_FLAG_ISO_USE_OLDJIS
)
3685 | (NILP (flags
[12]) ? 0 : CODING_FLAG_ISO_NO_DIRECTION
)
3686 | (NILP (flags
[13]) ? 0 : CODING_FLAG_ISO_INIT_AT_BOL
)
3687 | (NILP (flags
[14]) ? 0 : CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
3688 | (NILP (flags
[15]) ? 0 : CODING_FLAG_ISO_SAFE
)
3689 | (NILP (flags
[16]) ? 0 : CODING_FLAG_ISO_LATIN_EXTRA
)
3692 /* Invoke graphic register 0 to plane 0. */
3693 CODING_SPEC_ISO_INVOCATION (coding
, 0) = 0;
3694 /* Invoke graphic register 1 to plane 1 if we can use full 8-bit. */
3695 CODING_SPEC_ISO_INVOCATION (coding
, 1)
3696 = (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
? -1 : 1);
3697 /* Not single shifting at first. */
3698 CODING_SPEC_ISO_SINGLE_SHIFTING (coding
) = 0;
3699 /* Beginning of buffer should also be regarded as bol. */
3700 CODING_SPEC_ISO_BOL (coding
) = 1;
3702 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3703 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = 255;
3704 val
= Vcharset_revision_alist
;
3707 charset
= get_charset_id (Fcar_safe (XCAR (val
)));
3709 && (temp
= Fcdr_safe (XCAR (val
)), INTEGERP (temp
))
3710 && (i
= XINT (temp
), (i
>= 0 && (i
+ '@') < 128)))
3711 CODING_SPEC_ISO_REVISION_NUMBER (coding
, charset
) = i
;
3715 /* Checks FLAGS[REG] (REG = 0, 1, 2 3) and decide designations.
3716 FLAGS[REG] can be one of below:
3717 integer CHARSET: CHARSET occupies register I,
3718 t: designate nothing to REG initially, but can be used
3720 list of integer, nil, or t: designate the first
3721 element (if integer) to REG initially, the remaining
3722 elements (if integer) is designated to REG on request,
3723 if an element is t, REG can be used by any charsets,
3724 nil: REG is never used. */
3725 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3726 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3727 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
;
3728 for (i
= 0; i
< 4; i
++)
3730 if ((INTEGERP (flags
[i
])
3731 && (charset
= XINT (flags
[i
]), CHARSET_VALID_P (charset
)))
3732 || (charset
= get_charset_id (flags
[i
])) >= 0)
3734 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3735 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) = i
;
3737 else if (EQ (flags
[i
], Qt
))
3739 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3741 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3743 else if (CONSP (flags
[i
]))
3748 coding
->flags
|= CODING_FLAG_ISO_DESIGNATION
;
3749 if ((INTEGERP (XCAR (tail
))
3750 && (charset
= XINT (XCAR (tail
)),
3751 CHARSET_VALID_P (charset
)))
3752 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3754 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = charset
;
3755 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
) =i
;
3758 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3760 while (CONSP (tail
))
3762 if ((INTEGERP (XCAR (tail
))
3763 && (charset
= XINT (XCAR (tail
)),
3764 CHARSET_VALID_P (charset
)))
3765 || (charset
= get_charset_id (XCAR (tail
))) >= 0)
3766 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3768 else if (EQ (XCAR (tail
), Qt
))
3774 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
) = -1;
3776 CODING_SPEC_ISO_DESIGNATION (coding
, i
)
3777 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, i
);
3780 if (reg_bits
&& ! (coding
->flags
& CODING_FLAG_ISO_LOCKING_SHIFT
))
3782 /* REG 1 can be used only by locking shift in 7-bit env. */
3783 if (coding
->flags
& CODING_FLAG_ISO_SEVEN_BITS
)
3785 if (! (coding
->flags
& CODING_FLAG_ISO_SINGLE_SHIFT
))
3786 /* Without any shifting, only REG 0 and 1 can be used. */
3791 for (charset
= 0; charset
<= MAX_CHARSET
; charset
++)
3793 if (CHARSET_DEFINED_P (charset
)
3794 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3795 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION
))
3797 /* There exist some default graphic registers to be
3800 /* We had better avoid designating a charset of
3801 CHARS96 to REG 0 as far as possible. */
3802 if (CHARSET_CHARS (charset
) == 96)
3803 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3805 ? 1 : (reg_bits
& 4 ? 2 : (reg_bits
& 8 ? 3 : 0)));
3807 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding
, charset
)
3809 ? 0 : (reg_bits
& 2 ? 1 : (reg_bits
& 4 ? 2 : 3)));
3813 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3814 coding
->spec
.iso2022
.last_invalid_designation_register
= -1;
3818 coding
->type
= coding_type_big5
;
3819 coding
->common_flags
3820 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3822 = (NILP (XVECTOR (coding_spec
)->contents
[4])
3823 ? CODING_FLAG_BIG5_HKU
3824 : CODING_FLAG_BIG5_ETEN
);
3828 coding
->type
= coding_type_ccl
;
3829 coding
->common_flags
3830 |= CODING_REQUIRE_DECODING_MASK
| CODING_REQUIRE_ENCODING_MASK
;
3832 val
= XVECTOR (coding_spec
)->contents
[4];
3834 || setup_ccl_program (&(coding
->spec
.ccl
.decoder
),
3836 || setup_ccl_program (&(coding
->spec
.ccl
.encoder
),
3838 goto label_invalid_coding_system
;
3840 bzero (coding
->spec
.ccl
.valid_codes
, 256);
3841 val
= Fplist_get (plist
, Qvalid_codes
);
3846 for (; CONSP (val
); val
= XCDR (val
))
3850 && XINT (this) >= 0 && XINT (this) < 256)
3851 coding
->spec
.ccl
.valid_codes
[XINT (this)] = 1;
3852 else if (CONSP (this)
3853 && INTEGERP (XCAR (this))
3854 && INTEGERP (XCDR (this)))
3856 int start
= XINT (XCAR (this));
3857 int end
= XINT (XCDR (this));
3859 if (start
>= 0 && start
<= end
&& end
< 256)
3860 while (start
<= end
)
3861 coding
->spec
.ccl
.valid_codes
[start
++] = 1;
3866 coding
->common_flags
|= CODING_REQUIRE_FLUSHING_MASK
;
3867 coding
->spec
.ccl
.cr_carryover
= 0;
3868 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
3872 coding
->type
= coding_type_raw_text
;
3876 goto label_invalid_coding_system
;
3880 label_invalid_coding_system
:
3881 coding
->type
= coding_type_no_conversion
;
3882 coding
->category_idx
= CODING_CATEGORY_IDX_BINARY
;
3883 coding
->common_flags
= 0;
3884 coding
->eol_type
= CODING_EOL_LF
;
3885 coding
->pre_write_conversion
= coding
->post_read_conversion
= Qnil
;
3889 /* Free memory blocks allocated for storing composition information. */
3892 coding_free_composition_data (coding
)
3893 struct coding_system
*coding
;
3895 struct composition_data
*cmp_data
= coding
->cmp_data
, *next
;
3899 /* Memory blocks are chained. At first, rewind to the first, then,
3900 free blocks one by one. */
3901 while (cmp_data
->prev
)
3902 cmp_data
= cmp_data
->prev
;
3905 next
= cmp_data
->next
;
3909 coding
->cmp_data
= NULL
;
3912 /* Set `char_offset' member of all memory blocks pointed by
3913 coding->cmp_data to POS. */
3916 coding_adjust_composition_offset (coding
, pos
)
3917 struct coding_system
*coding
;
3920 struct composition_data
*cmp_data
;
3922 for (cmp_data
= coding
->cmp_data
; cmp_data
; cmp_data
= cmp_data
->next
)
3923 cmp_data
->char_offset
= pos
;
3926 /* Setup raw-text or one of its subsidiaries in the structure
3927 coding_system CODING according to the already setup value eol_type
3928 in CODING. CODING should be setup for some coding system in
3932 setup_raw_text_coding_system (coding
)
3933 struct coding_system
*coding
;
3935 if (coding
->type
!= coding_type_raw_text
)
3937 coding
->symbol
= Qraw_text
;
3938 coding
->type
= coding_type_raw_text
;
3939 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
3941 Lisp_Object subsidiaries
;
3942 subsidiaries
= Fget (Qraw_text
, Qeol_type
);
3944 if (VECTORP (subsidiaries
)
3945 && XVECTOR (subsidiaries
)->size
== 3)
3947 = XVECTOR (subsidiaries
)->contents
[coding
->eol_type
];
3949 setup_coding_system (coding
->symbol
, coding
);
3954 /* Emacs has a mechanism to automatically detect a coding system if it
3955 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
3956 it's impossible to distinguish some coding systems accurately
3957 because they use the same range of codes. So, at first, coding
3958 systems are categorized into 7, those are:
3960 o coding-category-emacs-mule
3962 The category for a coding system which has the same code range
3963 as Emacs' internal format. Assigned the coding-system (Lisp
3964 symbol) `emacs-mule' by default.
3966 o coding-category-sjis
3968 The category for a coding system which has the same code range
3969 as SJIS. Assigned the coding-system (Lisp
3970 symbol) `japanese-shift-jis' by default.
3972 o coding-category-iso-7
3974 The category for a coding system which has the same code range
3975 as ISO2022 of 7-bit environment. This doesn't use any locking
3976 shift and single shift functions. This can encode/decode all
3977 charsets. Assigned the coding-system (Lisp symbol)
3978 `iso-2022-7bit' by default.
3980 o coding-category-iso-7-tight
3982 Same as coding-category-iso-7 except that this can
3983 encode/decode only the specified charsets.
3985 o coding-category-iso-8-1
3987 The category for a coding system which has the same code range
3988 as ISO2022 of 8-bit environment and graphic plane 1 used only
3989 for DIMENSION1 charset. This doesn't use any locking shift
3990 and single shift functions. Assigned the coding-system (Lisp
3991 symbol) `iso-latin-1' by default.
3993 o coding-category-iso-8-2
3995 The category for a coding system which has the same code range
3996 as ISO2022 of 8-bit environment and graphic plane 1 used only
3997 for DIMENSION2 charset. This doesn't use any locking shift
3998 and single shift functions. Assigned the coding-system (Lisp
3999 symbol) `japanese-iso-8bit' by default.
4001 o coding-category-iso-7-else
4003 The category for a coding system which has the same code range
4004 as ISO2022 of 7-bit environment but uses locking shift or
4005 single shift functions. Assigned the coding-system (Lisp
4006 symbol) `iso-2022-7bit-lock' by default.
4008 o coding-category-iso-8-else
4010 The category for a coding system which has the same code range
4011 as ISO2022 of 8-bit environment but uses locking shift or
4012 single shift functions. Assigned the coding-system (Lisp
4013 symbol) `iso-2022-8bit-ss2' by default.
4015 o coding-category-big5
4017 The category for a coding system which has the same code range
4018 as BIG5. Assigned the coding-system (Lisp symbol)
4019 `cn-big5' by default.
4021 o coding-category-utf-8
4023 The category for a coding system which has the same code range
4024 as UTF-8 (cf. RFC2279). Assigned the coding-system (Lisp
4025 symbol) `utf-8' by default.
4027 o coding-category-utf-16-be
4029 The category for a coding system in which a text has an
4030 Unicode signature (cf. Unicode Standard) in the order of BIG
4031 endian at the head. Assigned the coding-system (Lisp symbol)
4032 `utf-16-be' by default.
4034 o coding-category-utf-16-le
4036 The category for a coding system in which a text has an
4037 Unicode signature (cf. Unicode Standard) in the order of
4038 LITTLE endian at the head. Assigned the coding-system (Lisp
4039 symbol) `utf-16-le' by default.
4041 o coding-category-ccl
4043 The category for a coding system of which encoder/decoder is
4044 written in CCL programs. The default value is nil, i.e., no
4045 coding system is assigned.
4047 o coding-category-binary
4049 The category for a coding system not categorized in any of the
4050 above. Assigned the coding-system (Lisp symbol)
4051 `no-conversion' by default.
4053 Each of them is a Lisp symbol and the value is an actual
4054 `coding-system' (this is also a Lisp symbol) assigned by a user.
4055 What Emacs does actually is to detect a category of coding system.
4056 Then, it uses a `coding-system' assigned to it. If Emacs can't
4057 decide a single possible category, it selects a category of the
4058 highest priority. Priorities of categories are also specified by a
4059 user in a Lisp variable `coding-category-list'.
4064 int ascii_skip_code
[256];
4066 /* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
4067 If it detects possible coding systems, return an integer in which
4068 appropriate flag bits are set. Flag bits are defined by macros
4069 CODING_CATEGORY_MASK_XXX in `coding.h'. If PRIORITIES is non-NULL,
4070 it should point the table `coding_priorities'. In that case, only
4071 the flag bit for a coding system of the highest priority is set in
4072 the returned value. If MULTIBYTEP is nonzero, 8-bit codes of the
4073 range 0x80..0x9F are in multibyte form.
4075 How many ASCII characters are at the head is returned as *SKIP. */
4078 detect_coding_mask (source
, src_bytes
, priorities
, skip
, multibytep
)
4079 unsigned char *source
;
4080 int src_bytes
, *priorities
, *skip
;
4083 register unsigned char c
;
4084 unsigned char *src
= source
, *src_end
= source
+ src_bytes
;
4085 unsigned int mask
, utf16_examined_p
, iso2022_examined_p
;
4088 /* At first, skip all ASCII characters and control characters except
4089 for three ISO2022 specific control characters. */
4090 ascii_skip_code
[ISO_CODE_SO
] = 0;
4091 ascii_skip_code
[ISO_CODE_SI
] = 0;
4092 ascii_skip_code
[ISO_CODE_ESC
] = 0;
4094 label_loop_detect_coding
:
4095 while (src
< src_end
&& ascii_skip_code
[*src
]) src
++;
4096 *skip
= src
- source
;
4099 /* We found nothing other than ASCII. There's nothing to do. */
4103 /* The text seems to be encoded in some multilingual coding system.
4104 Now, try to find in which coding system the text is encoded. */
4107 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
4108 /* C is an ISO2022 specific control code of C0. */
4109 mask
= detect_coding_iso2022 (src
, src_end
, multibytep
);
4112 /* No valid ISO2022 code follows C. Try again. */
4114 if (c
== ISO_CODE_ESC
)
4115 ascii_skip_code
[ISO_CODE_ESC
] = 1;
4117 ascii_skip_code
[ISO_CODE_SO
] = ascii_skip_code
[ISO_CODE_SI
] = 1;
4118 goto label_loop_detect_coding
;
4122 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4124 if (mask
& priorities
[i
])
4125 return priorities
[i
];
4127 return CODING_CATEGORY_MASK_RAW_TEXT
;
4134 if (multibytep
&& c
== LEADING_CODE_8_BIT_CONTROL
)
4139 /* C is the first byte of SJIS character code,
4140 or a leading-code of Emacs' internal format (emacs-mule),
4141 or the first byte of UTF-16. */
4142 try = (CODING_CATEGORY_MASK_SJIS
4143 | CODING_CATEGORY_MASK_EMACS_MULE
4144 | CODING_CATEGORY_MASK_UTF_16_BE
4145 | CODING_CATEGORY_MASK_UTF_16_LE
);
4147 /* Or, if C is a special latin extra code,
4148 or is an ISO2022 specific control code of C1 (SS2 or SS3),
4149 or is an ISO2022 control-sequence-introducer (CSI),
4150 we should also consider the possibility of ISO2022 codings. */
4151 if ((VECTORP (Vlatin_extra_code_table
)
4152 && !NILP (XVECTOR (Vlatin_extra_code_table
)->contents
[c
]))
4153 || (c
== ISO_CODE_SS2
|| c
== ISO_CODE_SS3
)
4154 || (c
== ISO_CODE_CSI
4157 || ((*src
== '0' || *src
== '1' || *src
== '2')
4158 && src
+ 1 < src_end
4159 && src
[1] == ']')))))
4160 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
4161 | CODING_CATEGORY_MASK_ISO_8BIT
);
4164 /* C is a character of ISO2022 in graphic plane right,
4165 or a SJIS's 1-byte character code (i.e. JISX0201),
4166 or the first byte of BIG5's 2-byte code,
4167 or the first byte of UTF-8/16. */
4168 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
4169 | CODING_CATEGORY_MASK_ISO_8BIT
4170 | CODING_CATEGORY_MASK_SJIS
4171 | CODING_CATEGORY_MASK_BIG5
4172 | CODING_CATEGORY_MASK_UTF_8
4173 | CODING_CATEGORY_MASK_UTF_16_BE
4174 | CODING_CATEGORY_MASK_UTF_16_LE
);
4176 /* Or, we may have to consider the possibility of CCL. */
4177 if (coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4178 && (coding_system_table
[CODING_CATEGORY_IDX_CCL
]
4179 ->spec
.ccl
.valid_codes
)[c
])
4180 try |= CODING_CATEGORY_MASK_CCL
;
4183 utf16_examined_p
= iso2022_examined_p
= 0;
4186 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
4188 if (!iso2022_examined_p
4189 && (priorities
[i
] & try & CODING_CATEGORY_MASK_ISO
))
4191 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
);
4192 iso2022_examined_p
= 1;
4194 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_SJIS
)
4195 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4196 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_UTF_8
)
4197 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4198 else if (!utf16_examined_p
4199 && (priorities
[i
] & try &
4200 CODING_CATEGORY_MASK_UTF_16_BE_LE
))
4202 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4203 utf16_examined_p
= 1;
4205 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_BIG5
)
4206 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4207 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_EMACS_MULE
)
4208 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4209 else if (priorities
[i
] & try & CODING_CATEGORY_MASK_CCL
)
4210 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4211 else if (priorities
[i
] & CODING_CATEGORY_MASK_RAW_TEXT
)
4212 mask
|= CODING_CATEGORY_MASK_RAW_TEXT
;
4213 else if (priorities
[i
] & CODING_CATEGORY_MASK_BINARY
)
4214 mask
|= CODING_CATEGORY_MASK_BINARY
;
4215 if (mask
& priorities
[i
])
4216 return priorities
[i
];
4218 return CODING_CATEGORY_MASK_RAW_TEXT
;
4220 if (try & CODING_CATEGORY_MASK_ISO
)
4221 mask
|= detect_coding_iso2022 (src
, src_end
, multibytep
);
4222 if (try & CODING_CATEGORY_MASK_SJIS
)
4223 mask
|= detect_coding_sjis (src
, src_end
, multibytep
);
4224 if (try & CODING_CATEGORY_MASK_BIG5
)
4225 mask
|= detect_coding_big5 (src
, src_end
, multibytep
);
4226 if (try & CODING_CATEGORY_MASK_UTF_8
)
4227 mask
|= detect_coding_utf_8 (src
, src_end
, multibytep
);
4228 if (try & CODING_CATEGORY_MASK_UTF_16_BE_LE
)
4229 mask
|= detect_coding_utf_16 (src
, src_end
, multibytep
);
4230 if (try & CODING_CATEGORY_MASK_EMACS_MULE
)
4231 mask
|= detect_coding_emacs_mule (src
, src_end
, multibytep
);
4232 if (try & CODING_CATEGORY_MASK_CCL
)
4233 mask
|= detect_coding_ccl (src
, src_end
, multibytep
);
4235 return (mask
| CODING_CATEGORY_MASK_RAW_TEXT
| CODING_CATEGORY_MASK_BINARY
);
4238 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
4239 The information of the detected coding system is set in CODING. */
4242 detect_coding (coding
, src
, src_bytes
)
4243 struct coding_system
*coding
;
4244 const unsigned char *src
;
4251 val
= Vcoding_category_list
;
4252 mask
= detect_coding_mask (src
, src_bytes
, coding_priorities
, &skip
,
4253 coding
->src_multibyte
);
4254 coding
->heading_ascii
= skip
;
4258 /* We found a single coding system of the highest priority in MASK. */
4260 while (mask
&& ! (mask
& 1)) mask
>>= 1, idx
++;
4262 idx
= CODING_CATEGORY_IDX_RAW_TEXT
;
4264 val
= SYMBOL_VALUE (XVECTOR (Vcoding_category_table
)->contents
[idx
]);
4266 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4270 tmp
= Fget (val
, Qeol_type
);
4272 val
= XVECTOR (tmp
)->contents
[coding
->eol_type
];
4275 /* Setup this new coding system while preserving some slots. */
4277 int src_multibyte
= coding
->src_multibyte
;
4278 int dst_multibyte
= coding
->dst_multibyte
;
4280 setup_coding_system (val
, coding
);
4281 coding
->src_multibyte
= src_multibyte
;
4282 coding
->dst_multibyte
= dst_multibyte
;
4283 coding
->heading_ascii
= skip
;
4287 /* Detect how end-of-line of a text of length SRC_BYTES pointed by
4288 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
4289 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
4291 How many non-eol characters are at the head is returned as *SKIP. */
4293 #define MAX_EOL_CHECK_COUNT 3
4296 detect_eol_type (source
, src_bytes
, skip
)
4297 unsigned char *source
;
4298 int src_bytes
, *skip
;
4300 unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4302 int total
= 0; /* How many end-of-lines are found so far. */
4303 int eol_type
= CODING_EOL_UNDECIDED
;
4308 while (src
< src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4311 if (c
== '\n' || c
== '\r')
4314 *skip
= src
- 1 - source
;
4317 this_eol_type
= CODING_EOL_LF
;
4318 else if (src
>= src_end
|| *src
!= '\n')
4319 this_eol_type
= CODING_EOL_CR
;
4321 this_eol_type
= CODING_EOL_CRLF
, src
++;
4323 if (eol_type
== CODING_EOL_UNDECIDED
)
4324 /* This is the first end-of-line. */
4325 eol_type
= this_eol_type
;
4326 else if (eol_type
!= this_eol_type
)
4328 /* The found type is different from what found before. */
4329 eol_type
= CODING_EOL_INCONSISTENT
;
4336 *skip
= src_end
- source
;
4340 /* Like detect_eol_type, but detect EOL type in 2-octet
4341 big-endian/little-endian format for coding systems utf-16-be and
4345 detect_eol_type_in_2_octet_form (source
, src_bytes
, skip
, big_endian_p
)
4346 unsigned char *source
;
4347 int src_bytes
, *skip
, big_endian_p
;
4349 unsigned char *src
= source
, *src_end
= src
+ src_bytes
;
4350 unsigned int c1
, c2
;
4351 int total
= 0; /* How many end-of-lines are found so far. */
4352 int eol_type
= CODING_EOL_UNDECIDED
;
4363 while ((src
+ 1) < src_end
&& total
< MAX_EOL_CHECK_COUNT
)
4365 c1
= (src
[msb
] << 8) | (src
[lsb
]);
4368 if (c1
== '\n' || c1
== '\r')
4371 *skip
= src
- 2 - source
;
4375 this_eol_type
= CODING_EOL_LF
;
4379 if ((src
+ 1) >= src_end
)
4381 this_eol_type
= CODING_EOL_CR
;
4385 c2
= (src
[msb
] << 8) | (src
[lsb
]);
4387 this_eol_type
= CODING_EOL_CRLF
, src
+= 2;
4389 this_eol_type
= CODING_EOL_CR
;
4393 if (eol_type
== CODING_EOL_UNDECIDED
)
4394 /* This is the first end-of-line. */
4395 eol_type
= this_eol_type
;
4396 else if (eol_type
!= this_eol_type
)
4398 /* The found type is different from what found before. */
4399 eol_type
= CODING_EOL_INCONSISTENT
;
4406 *skip
= src_end
- source
;
4410 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
4411 is encoded. If it detects an appropriate format of end-of-line, it
4412 sets the information in *CODING. */
4415 detect_eol (coding
, src
, src_bytes
)
4416 struct coding_system
*coding
;
4417 const unsigned char *src
;
4424 switch (coding
->category_idx
)
4426 case CODING_CATEGORY_IDX_UTF_16_BE
:
4427 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 1);
4429 case CODING_CATEGORY_IDX_UTF_16_LE
:
4430 eol_type
= detect_eol_type_in_2_octet_form (src
, src_bytes
, &skip
, 0);
4433 eol_type
= detect_eol_type (src
, src_bytes
, &skip
);
4437 if (coding
->heading_ascii
> skip
)
4438 coding
->heading_ascii
= skip
;
4440 skip
= coding
->heading_ascii
;
4442 if (eol_type
== CODING_EOL_UNDECIDED
)
4444 if (eol_type
== CODING_EOL_INCONSISTENT
)
4447 /* This code is suppressed until we find a better way to
4448 distinguish raw text file and binary file. */
4450 /* If we have already detected that the coding is raw-text, the
4451 coding should actually be no-conversion. */
4452 if (coding
->type
== coding_type_raw_text
)
4454 setup_coding_system (Qno_conversion
, coding
);
4457 /* Else, let's decode only text code anyway. */
4459 eol_type
= CODING_EOL_LF
;
4462 val
= Fget (coding
->symbol
, Qeol_type
);
4463 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4465 int src_multibyte
= coding
->src_multibyte
;
4466 int dst_multibyte
= coding
->dst_multibyte
;
4467 struct composition_data
*cmp_data
= coding
->cmp_data
;
4469 setup_coding_system (XVECTOR (val
)->contents
[eol_type
], coding
);
4470 coding
->src_multibyte
= src_multibyte
;
4471 coding
->dst_multibyte
= dst_multibyte
;
4472 coding
->heading_ascii
= skip
;
4473 coding
->cmp_data
= cmp_data
;
4477 #define CONVERSION_BUFFER_EXTRA_ROOM 256
4479 #define DECODING_BUFFER_MAG(coding) \
4480 (coding->type == coding_type_iso2022 \
4482 : (coding->type == coding_type_ccl \
4483 ? coding->spec.ccl.decoder.buf_magnification \
4486 /* Return maximum size (bytes) of a buffer enough for decoding
4487 SRC_BYTES of text encoded in CODING. */
4490 decoding_buffer_size (coding
, src_bytes
)
4491 struct coding_system
*coding
;
4494 return (src_bytes
* DECODING_BUFFER_MAG (coding
)
4495 + CONVERSION_BUFFER_EXTRA_ROOM
);
4498 /* Return maximum size (bytes) of a buffer enough for encoding
4499 SRC_BYTES of text to CODING. */
4502 encoding_buffer_size (coding
, src_bytes
)
4503 struct coding_system
*coding
;
4508 if (coding
->type
== coding_type_ccl
)
4510 magnification
= coding
->spec
.ccl
.encoder
.buf_magnification
;
4511 if (coding
->eol_type
== CODING_EOL_CRLF
)
4514 else if (CODING_REQUIRE_ENCODING (coding
))
4519 return (src_bytes
* magnification
+ CONVERSION_BUFFER_EXTRA_ROOM
);
4522 /* Working buffer for code conversion. */
4523 struct conversion_buffer
4525 int size
; /* size of data. */
4526 int on_stack
; /* 1 if allocated by alloca. */
4527 unsigned char *data
;
4530 /* Don't use alloca for allocating memory space larger than this, lest
4531 we overflow their stack. */
4532 #define MAX_ALLOCA 16*1024
4534 /* Allocate LEN bytes of memory for BUF (struct conversion_buffer). */
4535 #define allocate_conversion_buffer(buf, len) \
4537 if (len < MAX_ALLOCA) \
4539 buf.data = (unsigned char *) alloca (len); \
4544 buf.data = (unsigned char *) xmalloc (len); \
4550 /* Double the allocated memory for *BUF. */
4552 extend_conversion_buffer (buf
)
4553 struct conversion_buffer
*buf
;
4557 unsigned char *save
= buf
->data
;
4558 buf
->data
= (unsigned char *) xmalloc (buf
->size
* 2);
4559 bcopy (save
, buf
->data
, buf
->size
);
4564 buf
->data
= (unsigned char *) xrealloc (buf
->data
, buf
->size
* 2);
4569 /* Free the allocated memory for BUF if it is not on stack. */
4571 free_conversion_buffer (buf
)
4572 struct conversion_buffer
*buf
;
4579 ccl_coding_driver (coding
, source
, destination
, src_bytes
, dst_bytes
, encodep
)
4580 struct coding_system
*coding
;
4581 unsigned char *source
, *destination
;
4582 int src_bytes
, dst_bytes
, encodep
;
4584 struct ccl_program
*ccl
4585 = encodep
? &coding
->spec
.ccl
.encoder
: &coding
->spec
.ccl
.decoder
;
4586 unsigned char *dst
= destination
;
4588 ccl
->suppress_error
= coding
->suppress_error
;
4589 ccl
->last_block
= coding
->mode
& CODING_MODE_LAST_BLOCK
;
4592 /* On encoding, EOL format is converted within ccl_driver. For
4593 that, setup proper information in the structure CCL. */
4594 ccl
->eol_type
= coding
->eol_type
;
4595 if (ccl
->eol_type
==CODING_EOL_UNDECIDED
)
4596 ccl
->eol_type
= CODING_EOL_LF
;
4597 ccl
->cr_consumed
= coding
->spec
.ccl
.cr_carryover
;
4598 ccl
->eight_bit_control
= coding
->dst_multibyte
;
4601 ccl
->eight_bit_control
= 1;
4602 ccl
->multibyte
= coding
->src_multibyte
;
4603 if (coding
->spec
.ccl
.eight_bit_carryover
[0] != 0)
4605 /* Move carryover bytes to DESTINATION. */
4606 unsigned char *p
= coding
->spec
.ccl
.eight_bit_carryover
;
4609 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4611 dst_bytes
-= dst
- destination
;
4614 coding
->produced
= (ccl_driver (ccl
, source
, dst
, src_bytes
, dst_bytes
,
4615 &(coding
->consumed
))
4616 + dst
- destination
);
4620 coding
->produced_char
= coding
->produced
;
4621 coding
->spec
.ccl
.cr_carryover
= ccl
->cr_consumed
;
4623 else if (!ccl
->eight_bit_control
)
4625 /* The produced bytes forms a valid multibyte sequence. */
4626 coding
->produced_char
4627 = multibyte_chars_in_text (destination
, coding
->produced
);
4628 coding
->spec
.ccl
.eight_bit_carryover
[0] = 0;
4632 /* On decoding, the destination should always multibyte. But,
4633 CCL program might have been generated an invalid multibyte
4634 sequence. Here we make such a sequence valid as
4637 = dst_bytes
? dst_bytes
: source
+ coding
->consumed
- destination
;
4639 if ((coding
->consumed
< src_bytes
4640 || !ccl
->last_block
)
4641 && coding
->produced
>= 1
4642 && destination
[coding
->produced
- 1] >= 0x80)
4644 /* We should not convert the tailing 8-bit codes to
4645 multibyte form even if they doesn't form a valid
4646 multibyte sequence. They may form a valid sequence in
4650 if (destination
[coding
->produced
- 1] < 0xA0)
4652 else if (coding
->produced
>= 2)
4654 if (destination
[coding
->produced
- 2] >= 0x80)
4656 if (destination
[coding
->produced
- 2] < 0xA0)
4658 else if (coding
->produced
>= 3
4659 && destination
[coding
->produced
- 3] >= 0x80
4660 && destination
[coding
->produced
- 3] < 0xA0)
4666 BCOPY_SHORT (destination
+ coding
->produced
- carryover
,
4667 coding
->spec
.ccl
.eight_bit_carryover
,
4669 coding
->spec
.ccl
.eight_bit_carryover
[carryover
] = 0;
4670 coding
->produced
-= carryover
;
4673 coding
->produced
= str_as_multibyte (destination
, bytes
,
4675 &(coding
->produced_char
));
4678 switch (ccl
->status
)
4680 case CCL_STAT_SUSPEND_BY_SRC
:
4681 coding
->result
= CODING_FINISH_INSUFFICIENT_SRC
;
4683 case CCL_STAT_SUSPEND_BY_DST
:
4684 coding
->result
= CODING_FINISH_INSUFFICIENT_DST
;
4687 case CCL_STAT_INVALID_CMD
:
4688 coding
->result
= CODING_FINISH_INTERRUPT
;
4691 coding
->result
= CODING_FINISH_NORMAL
;
4694 return coding
->result
;
4697 /* Decode EOL format of the text at PTR of BYTES length destructively
4698 according to CODING->eol_type. This is called after the CCL
4699 program produced a decoded text at PTR. If we do CRLF->LF
4700 conversion, update CODING->produced and CODING->produced_char. */
4703 decode_eol_post_ccl (coding
, ptr
, bytes
)
4704 struct coding_system
*coding
;
4708 Lisp_Object val
, saved_coding_symbol
;
4709 unsigned char *pend
= ptr
+ bytes
;
4712 /* Remember the current coding system symbol. We set it back when
4713 an inconsistent EOL is found so that `last-coding-system-used' is
4714 set to the coding system that doesn't specify EOL conversion. */
4715 saved_coding_symbol
= coding
->symbol
;
4717 coding
->spec
.ccl
.cr_carryover
= 0;
4718 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
4720 /* Here, to avoid the call of setup_coding_system, we directly
4721 call detect_eol_type. */
4722 coding
->eol_type
= detect_eol_type (ptr
, bytes
, &dummy
);
4723 if (coding
->eol_type
== CODING_EOL_INCONSISTENT
)
4724 coding
->eol_type
= CODING_EOL_LF
;
4725 if (coding
->eol_type
!= CODING_EOL_UNDECIDED
)
4727 val
= Fget (coding
->symbol
, Qeol_type
);
4728 if (VECTORP (val
) && XVECTOR (val
)->size
== 3)
4729 coding
->symbol
= XVECTOR (val
)->contents
[coding
->eol_type
];
4731 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4734 if (coding
->eol_type
== CODING_EOL_LF
4735 || coding
->eol_type
== CODING_EOL_UNDECIDED
)
4737 /* We have nothing to do. */
4740 else if (coding
->eol_type
== CODING_EOL_CRLF
)
4742 unsigned char *pstart
= ptr
, *p
= ptr
;
4744 if (! (coding
->mode
& CODING_MODE_LAST_BLOCK
)
4745 && *(pend
- 1) == '\r')
4747 /* If the last character is CR, we can't handle it here
4748 because LF will be in the not-yet-decoded source text.
4749 Record that the CR is not yet processed. */
4750 coding
->spec
.ccl
.cr_carryover
= 1;
4752 coding
->produced_char
--;
4759 if (ptr
+ 1 < pend
&& *(ptr
+ 1) == '\n')
4766 if (coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4767 goto undo_eol_conversion
;
4771 else if (*ptr
== '\n'
4772 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4773 goto undo_eol_conversion
;
4778 undo_eol_conversion
:
4779 /* We have faced with inconsistent EOL format at PTR.
4780 Convert all LFs before PTR back to CRLFs. */
4781 for (p
--, ptr
--; p
>= pstart
; p
--)
4784 *ptr
-- = '\n', *ptr
-- = '\r';
4788 /* If carryover is recorded, cancel it because we don't
4789 convert CRLF anymore. */
4790 if (coding
->spec
.ccl
.cr_carryover
)
4792 coding
->spec
.ccl
.cr_carryover
= 0;
4794 coding
->produced_char
++;
4798 coding
->eol_type
= CODING_EOL_LF
;
4799 coding
->symbol
= saved_coding_symbol
;
4803 /* As each two-byte sequence CRLF was converted to LF, (PEND
4804 - P) is the number of deleted characters. */
4805 coding
->produced
-= pend
- p
;
4806 coding
->produced_char
-= pend
- p
;
4809 else /* i.e. coding->eol_type == CODING_EOL_CR */
4811 unsigned char *p
= ptr
;
4813 for (; ptr
< pend
; ptr
++)
4817 else if (*ptr
== '\n'
4818 && coding
->mode
& CODING_MODE_INHIBIT_INCONSISTENT_EOL
)
4820 for (; p
< ptr
; p
++)
4826 coding
->eol_type
= CODING_EOL_LF
;
4827 coding
->symbol
= saved_coding_symbol
;
4833 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
4834 decoding, it may detect coding system and format of end-of-line if
4835 those are not yet decided. The source should be unibyte, the
4836 result is multibyte if CODING->dst_multibyte is nonzero, else
4840 decode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
4841 struct coding_system
*coding
;
4842 const unsigned char *source
;
4843 unsigned char *destination
;
4844 int src_bytes
, dst_bytes
;
4848 if (coding
->type
== coding_type_undecided
)
4849 detect_coding (coding
, source
, src_bytes
);
4851 if (coding
->eol_type
== CODING_EOL_UNDECIDED
4852 && coding
->type
!= coding_type_ccl
)
4854 detect_eol (coding
, source
, src_bytes
);
4855 /* We had better recover the original eol format if we
4856 encounter an inconsistent eol format while decoding. */
4857 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
4860 coding
->produced
= coding
->produced_char
= 0;
4861 coding
->consumed
= coding
->consumed_char
= 0;
4863 coding
->result
= CODING_FINISH_NORMAL
;
4865 switch (coding
->type
)
4867 case coding_type_sjis
:
4868 decode_coding_sjis_big5 (coding
, source
, destination
,
4869 src_bytes
, dst_bytes
, 1);
4872 case coding_type_iso2022
:
4873 decode_coding_iso2022 (coding
, source
, destination
,
4874 src_bytes
, dst_bytes
);
4877 case coding_type_big5
:
4878 decode_coding_sjis_big5 (coding
, source
, destination
,
4879 src_bytes
, dst_bytes
, 0);
4882 case coding_type_emacs_mule
:
4883 decode_coding_emacs_mule (coding
, source
, destination
,
4884 src_bytes
, dst_bytes
);
4887 case coding_type_ccl
:
4888 if (coding
->spec
.ccl
.cr_carryover
)
4890 /* Put the CR which was not processed by the previous call
4891 of decode_eol_post_ccl in DESTINATION. It will be
4892 decoded together with the following LF by the call to
4893 decode_eol_post_ccl below. */
4894 *destination
= '\r';
4896 coding
->produced_char
++;
4898 extra
= coding
->spec
.ccl
.cr_carryover
;
4900 ccl_coding_driver (coding
, source
, destination
+ extra
,
4901 src_bytes
, dst_bytes
, 0);
4902 if (coding
->eol_type
!= CODING_EOL_LF
)
4904 coding
->produced
+= extra
;
4905 coding
->produced_char
+= extra
;
4906 decode_eol_post_ccl (coding
, destination
, coding
->produced
);
4911 decode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
4914 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
4915 && coding
->mode
& CODING_MODE_LAST_BLOCK
4916 && coding
->consumed
== src_bytes
)
4917 coding
->result
= CODING_FINISH_NORMAL
;
4919 if (coding
->mode
& CODING_MODE_LAST_BLOCK
4920 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
4922 const unsigned char *src
= source
+ coding
->consumed
;
4923 unsigned char *dst
= destination
+ coding
->produced
;
4925 src_bytes
-= coding
->consumed
;
4927 if (COMPOSING_P (coding
))
4928 DECODE_COMPOSITION_END ('1');
4932 dst
+= CHAR_STRING (c
, dst
);
4933 coding
->produced_char
++;
4935 coding
->consumed
= coding
->consumed_char
= src
- source
;
4936 coding
->produced
= dst
- destination
;
4937 coding
->result
= CODING_FINISH_NORMAL
;
4940 if (!coding
->dst_multibyte
)
4942 coding
->produced
= str_as_unibyte (destination
, coding
->produced
);
4943 coding
->produced_char
= coding
->produced
;
4946 return coding
->result
;
4949 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". The
4950 multibyteness of the source is CODING->src_multibyte, the
4951 multibyteness of the result is always unibyte. */
4954 encode_coding (coding
, source
, destination
, src_bytes
, dst_bytes
)
4955 struct coding_system
*coding
;
4956 const unsigned char *source
;
4957 unsigned char *destination
;
4958 int src_bytes
, dst_bytes
;
4960 coding
->produced
= coding
->produced_char
= 0;
4961 coding
->consumed
= coding
->consumed_char
= 0;
4963 coding
->result
= CODING_FINISH_NORMAL
;
4965 switch (coding
->type
)
4967 case coding_type_sjis
:
4968 encode_coding_sjis_big5 (coding
, source
, destination
,
4969 src_bytes
, dst_bytes
, 1);
4972 case coding_type_iso2022
:
4973 encode_coding_iso2022 (coding
, source
, destination
,
4974 src_bytes
, dst_bytes
);
4977 case coding_type_big5
:
4978 encode_coding_sjis_big5 (coding
, source
, destination
,
4979 src_bytes
, dst_bytes
, 0);
4982 case coding_type_emacs_mule
:
4983 encode_coding_emacs_mule (coding
, source
, destination
,
4984 src_bytes
, dst_bytes
);
4987 case coding_type_ccl
:
4988 ccl_coding_driver (coding
, source
, destination
,
4989 src_bytes
, dst_bytes
, 1);
4993 encode_eol (coding
, source
, destination
, src_bytes
, dst_bytes
);
4996 if (coding
->mode
& CODING_MODE_LAST_BLOCK
4997 && coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
)
4999 const unsigned char *src
= source
+ coding
->consumed
;
5000 unsigned char *dst
= destination
+ coding
->produced
;
5002 if (coding
->type
== coding_type_iso2022
)
5003 ENCODE_RESET_PLANE_AND_REGISTER
;
5004 if (COMPOSING_P (coding
))
5005 *dst
++ = ISO_CODE_ESC
, *dst
++ = '1';
5006 if (coding
->consumed
< src_bytes
)
5008 int len
= src_bytes
- coding
->consumed
;
5010 BCOPY_SHORT (src
, dst
, len
);
5011 if (coding
->src_multibyte
)
5012 len
= str_as_unibyte (dst
, len
);
5014 coding
->consumed
= src_bytes
;
5016 coding
->produced
= coding
->produced_char
= dst
- destination
;
5017 coding
->result
= CODING_FINISH_NORMAL
;
5020 if (coding
->result
== CODING_FINISH_INSUFFICIENT_SRC
5021 && coding
->consumed
== src_bytes
)
5022 coding
->result
= CODING_FINISH_NORMAL
;
5024 return coding
->result
;
5027 /* Scan text in the region between *BEG and *END (byte positions),
5028 skip characters which we don't have to decode by coding system
5029 CODING at the head and tail, then set *BEG and *END to the region
5030 of the text we actually have to convert. The caller should move
5031 the gap out of the region in advance if the region is from a
5034 If STR is not NULL, *BEG and *END are indices into STR. */
5037 shrink_decoding_region (beg
, end
, coding
, str
)
5039 struct coding_system
*coding
;
5042 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
, c
;
5044 Lisp_Object translation_table
;
5046 if (coding
->type
== coding_type_ccl
5047 || coding
->type
== coding_type_undecided
5048 || coding
->eol_type
!= CODING_EOL_LF
5049 || !NILP (coding
->post_read_conversion
)
5050 || coding
->composing
!= COMPOSITION_DISABLED
)
5052 /* We can't skip any data. */
5055 if (coding
->type
== coding_type_no_conversion
5056 || coding
->type
== coding_type_raw_text
5057 || coding
->type
== coding_type_emacs_mule
)
5059 /* We need no conversion, but don't have to skip any data here.
5060 Decoding routine handles them effectively anyway. */
5064 translation_table
= coding
->translation_table_for_decode
;
5065 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
5066 translation_table
= Vstandard_translation_table_for_decode
;
5067 if (CHAR_TABLE_P (translation_table
))
5070 for (i
= 0; i
< 128; i
++)
5071 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
5074 /* Some ASCII character should be translated. We give up
5079 if (coding
->heading_ascii
>= 0)
5080 /* Detection routine has already found how much we can skip at the
5082 *beg
+= coding
->heading_ascii
;
5086 begp_orig
= begp
= str
+ *beg
;
5087 endp_orig
= endp
= str
+ *end
;
5091 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
5092 endp_orig
= endp
= begp
+ *end
- *beg
;
5095 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
5096 || coding
->eol_type
== CODING_EOL_CRLF
);
5098 switch (coding
->type
)
5100 case coding_type_sjis
:
5101 case coding_type_big5
:
5102 /* We can skip all ASCII characters at the head. */
5103 if (coding
->heading_ascii
< 0)
5106 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\r') begp
++;
5108 while (begp
< endp
&& *begp
< 0x80) begp
++;
5110 /* We can skip all ASCII characters at the tail except for the
5111 second byte of SJIS or BIG5 code. */
5113 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\r') endp
--;
5115 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
5116 /* Do not consider LF as ascii if preceded by CR, since that
5117 confuses eol decoding. */
5118 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5120 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] >= 0x80)
5124 case coding_type_iso2022
:
5125 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5126 /* We can't skip any data. */
5128 if (coding
->heading_ascii
< 0)
5130 /* We can skip all ASCII characters at the head except for a
5131 few control codes. */
5132 while (begp
< endp
&& (c
= *begp
) < 0x80
5133 && c
!= ISO_CODE_CR
&& c
!= ISO_CODE_SO
5134 && c
!= ISO_CODE_SI
&& c
!= ISO_CODE_ESC
5135 && (!eol_conversion
|| c
!= ISO_CODE_LF
))
5138 switch (coding
->category_idx
)
5140 case CODING_CATEGORY_IDX_ISO_8_1
:
5141 case CODING_CATEGORY_IDX_ISO_8_2
:
5142 /* We can skip all ASCII characters at the tail. */
5144 while (begp
< endp
&& (c
= endp
[-1]) < 0x80 && c
!= '\r') endp
--;
5146 while (begp
< endp
&& endp
[-1] < 0x80) endp
--;
5147 /* Do not consider LF as ascii if preceded by CR, since that
5148 confuses eol decoding. */
5149 if (begp
< endp
&& endp
< endp_orig
&& endp
[-1] == '\r' && endp
[0] == '\n')
5153 case CODING_CATEGORY_IDX_ISO_7
:
5154 case CODING_CATEGORY_IDX_ISO_7_TIGHT
:
5156 /* We can skip all characters at the tail except for 8-bit
5157 codes and ESC and the following 2-byte at the tail. */
5158 unsigned char *eight_bit
= NULL
;
5162 && (c
= endp
[-1]) != ISO_CODE_ESC
&& c
!= '\r')
5164 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5169 && (c
= endp
[-1]) != ISO_CODE_ESC
)
5171 if (!eight_bit
&& c
& 0x80) eight_bit
= endp
;
5174 /* Do not consider LF as ascii if preceded by CR, since that
5175 confuses eol decoding. */
5176 if (begp
< endp
&& endp
< endp_orig
5177 && endp
[-1] == '\r' && endp
[0] == '\n')
5179 if (begp
< endp
&& endp
[-1] == ISO_CODE_ESC
)
5181 if (endp
+ 1 < endp_orig
&& end
[0] == '(' && end
[1] == 'B')
5182 /* This is an ASCII designation sequence. We can
5183 surely skip the tail. But, if we have
5184 encountered an 8-bit code, skip only the codes
5186 endp
= eight_bit
? eight_bit
: endp
+ 2;
5188 /* Hmmm, we can't skip the tail. */
5200 *beg
+= begp
- begp_orig
;
5201 *end
+= endp
- endp_orig
;
5205 /* Like shrink_decoding_region but for encoding. */
5208 shrink_encoding_region (beg
, end
, coding
, str
)
5210 struct coding_system
*coding
;
5213 unsigned char *begp_orig
, *begp
, *endp_orig
, *endp
;
5215 Lisp_Object translation_table
;
5217 if (coding
->type
== coding_type_ccl
5218 || coding
->eol_type
== CODING_EOL_CRLF
5219 || coding
->eol_type
== CODING_EOL_CR
5220 || (coding
->cmp_data
&& coding
->cmp_data
->used
> 0))
5222 /* We can't skip any data. */
5225 if (coding
->type
== coding_type_no_conversion
5226 || coding
->type
== coding_type_raw_text
5227 || coding
->type
== coding_type_emacs_mule
5228 || coding
->type
== coding_type_undecided
)
5230 /* We need no conversion, but don't have to skip any data here.
5231 Encoding routine handles them effectively anyway. */
5235 translation_table
= coding
->translation_table_for_encode
;
5236 if (NILP (translation_table
) && !NILP (Venable_character_translation
))
5237 translation_table
= Vstandard_translation_table_for_encode
;
5238 if (CHAR_TABLE_P (translation_table
))
5241 for (i
= 0; i
< 128; i
++)
5242 if (!NILP (CHAR_TABLE_REF (translation_table
, i
)))
5245 /* Some ASCII character should be translated. We give up
5252 begp_orig
= begp
= str
+ *beg
;
5253 endp_orig
= endp
= str
+ *end
;
5257 begp_orig
= begp
= BYTE_POS_ADDR (*beg
);
5258 endp_orig
= endp
= begp
+ *end
- *beg
;
5261 eol_conversion
= (coding
->eol_type
== CODING_EOL_CR
5262 || coding
->eol_type
== CODING_EOL_CRLF
);
5264 /* Here, we don't have to check coding->pre_write_conversion because
5265 the caller is expected to have handled it already. */
5266 switch (coding
->type
)
5268 case coding_type_iso2022
:
5269 if (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding
, 0) != CHARSET_ASCII
)
5270 /* We can't skip any data. */
5272 if (coding
->flags
& CODING_FLAG_ISO_DESIGNATE_AT_BOL
)
5274 unsigned char *bol
= begp
;
5275 while (begp
< endp
&& *begp
< 0x80)
5278 if (begp
[-1] == '\n')
5282 goto label_skip_tail
;
5286 case coding_type_sjis
:
5287 case coding_type_big5
:
5288 /* We can skip all ASCII characters at the head and tail. */
5290 while (begp
< endp
&& *begp
< 0x80 && *begp
!= '\n') begp
++;
5292 while (begp
< endp
&& *begp
< 0x80) begp
++;
5295 while (begp
< endp
&& endp
[-1] < 0x80 && endp
[-1] != '\n') endp
--;
5297 while (begp
< endp
&& *(endp
- 1) < 0x80) endp
--;
5304 *beg
+= begp
- begp_orig
;
5305 *end
+= endp
- endp_orig
;
5309 /* As shrinking conversion region requires some overhead, we don't try
5310 shrinking if the length of conversion region is less than this
5312 static int shrink_conversion_region_threshhold
= 1024;
5314 #define SHRINK_CONVERSION_REGION(beg, end, coding, str, encodep) \
5316 if (*(end) - *(beg) > shrink_conversion_region_threshhold) \
5318 if (encodep) shrink_encoding_region (beg, end, coding, str); \
5319 else shrink_decoding_region (beg, end, coding, str); \
5324 code_convert_region_unwind (arg
)
5327 inhibit_pre_post_conversion
= 0;
5328 Vlast_coding_system_used
= arg
;
5332 /* Store information about all compositions in the range FROM and TO
5333 of OBJ in memory blocks pointed by CODING->cmp_data. OBJ is a
5334 buffer or a string, defaults to the current buffer. */
5337 coding_save_composition (coding
, from
, to
, obj
)
5338 struct coding_system
*coding
;
5345 if (coding
->composing
== COMPOSITION_DISABLED
)
5347 if (!coding
->cmp_data
)
5348 coding_allocate_composition_data (coding
, from
);
5349 if (!find_composition (from
, to
, &start
, &end
, &prop
, obj
)
5353 && (!find_composition (end
, to
, &start
, &end
, &prop
, obj
)
5356 coding
->composing
= COMPOSITION_NO
;
5359 if (COMPOSITION_VALID_P (start
, end
, prop
))
5361 enum composition_method method
= COMPOSITION_METHOD (prop
);
5362 if (coding
->cmp_data
->used
+ COMPOSITION_DATA_MAX_BUNCH_LENGTH
5363 >= COMPOSITION_DATA_SIZE
)
5364 coding_allocate_composition_data (coding
, from
);
5365 /* For relative composition, we remember start and end
5366 positions, for the other compositions, we also remember
5368 CODING_ADD_COMPOSITION_START (coding
, start
- from
, method
);
5369 if (method
!= COMPOSITION_RELATIVE
)
5371 /* We must store a*/
5372 Lisp_Object val
, ch
;
5374 val
= COMPOSITION_COMPONENTS (prop
);
5378 ch
= XCAR (val
), val
= XCDR (val
);
5379 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5381 else if (VECTORP (val
) || STRINGP (val
))
5383 int len
= (VECTORP (val
)
5384 ? XVECTOR (val
)->size
: SCHARS (val
));
5386 for (i
= 0; i
< len
; i
++)
5389 ? Faref (val
, make_number (i
))
5390 : XVECTOR (val
)->contents
[i
]);
5391 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (ch
));
5394 else /* INTEGERP (val) */
5395 CODING_ADD_COMPOSITION_COMPONENT (coding
, XINT (val
));
5397 CODING_ADD_COMPOSITION_END (coding
, end
- from
);
5402 && find_composition (start
, to
, &start
, &end
, &prop
, obj
)
5405 /* Make coding->cmp_data point to the first memory block. */
5406 while (coding
->cmp_data
->prev
)
5407 coding
->cmp_data
= coding
->cmp_data
->prev
;
5408 coding
->cmp_data_start
= 0;
5411 /* Reflect the saved information about compositions to OBJ.
5412 CODING->cmp_data points to a memory block for the information. OBJ
5413 is a buffer or a string, defaults to the current buffer. */
5416 coding_restore_composition (coding
, obj
)
5417 struct coding_system
*coding
;
5420 struct composition_data
*cmp_data
= coding
->cmp_data
;
5425 while (cmp_data
->prev
)
5426 cmp_data
= cmp_data
->prev
;
5432 for (i
= 0; i
< cmp_data
->used
&& cmp_data
->data
[i
] > 0;
5433 i
+= cmp_data
->data
[i
])
5435 int *data
= cmp_data
->data
+ i
;
5436 enum composition_method method
= (enum composition_method
) data
[3];
5437 Lisp_Object components
;
5439 if (method
== COMPOSITION_RELATIVE
)
5443 int len
= data
[0] - 4, j
;
5444 Lisp_Object args
[MAX_COMPOSITION_COMPONENTS
* 2 - 1];
5446 if (method
== COMPOSITION_WITH_RULE_ALTCHARS
5449 for (j
= 0; j
< len
; j
++)
5450 args
[j
] = make_number (data
[4 + j
]);
5451 components
= (method
== COMPOSITION_WITH_ALTCHARS
5452 ? Fstring (len
, args
) : Fvector (len
, args
));
5454 compose_text (data
[1], data
[2], components
, Qnil
, obj
);
5456 cmp_data
= cmp_data
->next
;
5460 /* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
5461 text from FROM to TO (byte positions are FROM_BYTE and TO_BYTE) by
5462 coding system CODING, and return the status code of code conversion
5463 (currently, this value has no meaning).
5465 How many characters (and bytes) are converted to how many
5466 characters (and bytes) are recorded in members of the structure
5469 If REPLACE is nonzero, we do various things as if the original text
5470 is deleted and a new text is inserted. See the comments in
5471 replace_range (insdel.c) to know what we are doing.
5473 If REPLACE is zero, it is assumed that the source text is unibyte.
5474 Otherwise, it is assumed that the source text is multibyte. */
5477 code_convert_region (from
, from_byte
, to
, to_byte
, coding
, encodep
, replace
)
5478 int from
, from_byte
, to
, to_byte
, encodep
, replace
;
5479 struct coding_system
*coding
;
5481 int len
= to
- from
, len_byte
= to_byte
- from_byte
;
5482 int nchars_del
= 0, nbytes_del
= 0;
5483 int require
, inserted
, inserted_byte
;
5484 int head_skip
, tail_skip
, total_skip
= 0;
5485 Lisp_Object saved_coding_symbol
;
5487 unsigned char *src
, *dst
;
5488 Lisp_Object deletion
;
5489 int orig_point
= PT
, orig_len
= len
;
5491 int multibyte_p
= !NILP (current_buffer
->enable_multibyte_characters
);
5494 saved_coding_symbol
= coding
->symbol
;
5496 if (from
< PT
&& PT
< to
)
5498 TEMP_SET_PT_BOTH (from
, from_byte
);
5504 int saved_from
= from
;
5505 int saved_inhibit_modification_hooks
;
5507 prepare_to_modify_buffer (from
, to
, &from
);
5508 if (saved_from
!= from
)
5511 from_byte
= CHAR_TO_BYTE (from
), to_byte
= CHAR_TO_BYTE (to
);
5512 len_byte
= to_byte
- from_byte
;
5515 /* The code conversion routine can not preserve text properties
5516 for now. So, we must remove all text properties in the
5517 region. Here, we must suppress all modification hooks. */
5518 saved_inhibit_modification_hooks
= inhibit_modification_hooks
;
5519 inhibit_modification_hooks
= 1;
5520 Fset_text_properties (make_number (from
), make_number (to
), Qnil
, Qnil
);
5521 inhibit_modification_hooks
= saved_inhibit_modification_hooks
;
5524 if (! encodep
&& CODING_REQUIRE_DETECTION (coding
))
5526 /* We must detect encoding of text and eol format. */
5528 if (from
< GPT
&& to
> GPT
)
5529 move_gap_both (from
, from_byte
);
5530 if (coding
->type
== coding_type_undecided
)
5532 detect_coding (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5533 if (coding
->type
== coding_type_undecided
)
5535 /* It seems that the text contains only ASCII, but we
5536 should not leave it undecided because the deeper
5537 decoding routine (decode_coding) tries to detect the
5538 encodings again in vain. */
5539 coding
->type
= coding_type_emacs_mule
;
5540 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
5541 /* As emacs-mule decoder will handle composition, we
5542 need this setting to allocate coding->cmp_data
5544 coding
->composing
= COMPOSITION_NO
;
5547 if (coding
->eol_type
== CODING_EOL_UNDECIDED
5548 && coding
->type
!= coding_type_ccl
)
5550 detect_eol (coding
, BYTE_POS_ADDR (from_byte
), len_byte
);
5551 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
5552 coding
->eol_type
= CODING_EOL_LF
;
5553 /* We had better recover the original eol format if we
5554 encounter an inconsistent eol format while decoding. */
5555 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
5559 /* Now we convert the text. */
5561 /* For encoding, we must process pre-write-conversion in advance. */
5562 if (! inhibit_pre_post_conversion
5564 && SYMBOLP (coding
->pre_write_conversion
)
5565 && ! NILP (Ffboundp (coding
->pre_write_conversion
)))
5567 /* The function in pre-write-conversion may put a new text in a
5569 struct buffer
*prev
= current_buffer
;
5572 record_unwind_protect (code_convert_region_unwind
,
5573 Vlast_coding_system_used
);
5574 /* We should not call any more pre-write/post-read-conversion
5575 functions while this pre-write-conversion is running. */
5576 inhibit_pre_post_conversion
= 1;
5577 call2 (coding
->pre_write_conversion
,
5578 make_number (from
), make_number (to
));
5579 inhibit_pre_post_conversion
= 0;
5580 /* Discard the unwind protect. */
5583 if (current_buffer
!= prev
)
5586 new = Fcurrent_buffer ();
5587 set_buffer_internal_1 (prev
);
5588 del_range_2 (from
, from_byte
, to
, to_byte
, 0);
5589 TEMP_SET_PT_BOTH (from
, from_byte
);
5590 insert_from_buffer (XBUFFER (new), 1, len
, 0);
5592 if (orig_point
>= to
)
5593 orig_point
+= len
- orig_len
;
5594 else if (orig_point
> from
)
5598 from_byte
= CHAR_TO_BYTE (from
);
5599 to_byte
= CHAR_TO_BYTE (to
);
5600 len_byte
= to_byte
- from_byte
;
5601 TEMP_SET_PT_BOTH (from
, from_byte
);
5607 if (! EQ (current_buffer
->undo_list
, Qt
))
5608 deletion
= make_buffer_string_both (from
, from_byte
, to
, to_byte
, 1);
5611 nchars_del
= to
- from
;
5612 nbytes_del
= to_byte
- from_byte
;
5616 if (coding
->composing
!= COMPOSITION_DISABLED
)
5619 coding_save_composition (coding
, from
, to
, Fcurrent_buffer ());
5621 coding_allocate_composition_data (coding
, from
);
5624 /* Try to skip the heading and tailing ASCIIs. */
5625 if (coding
->type
!= coding_type_ccl
)
5627 int from_byte_orig
= from_byte
, to_byte_orig
= to_byte
;
5629 if (from
< GPT
&& GPT
< to
)
5630 move_gap_both (from
, from_byte
);
5631 SHRINK_CONVERSION_REGION (&from_byte
, &to_byte
, coding
, NULL
, encodep
);
5632 if (from_byte
== to_byte
5633 && (encodep
|| NILP (coding
->post_read_conversion
))
5634 && ! CODING_REQUIRE_FLUSHING (coding
))
5636 coding
->produced
= len_byte
;
5637 coding
->produced_char
= len
;
5639 /* We must record and adjust for this new text now. */
5640 adjust_after_insert (from
, from_byte_orig
, to
, to_byte_orig
, len
);
5644 head_skip
= from_byte
- from_byte_orig
;
5645 tail_skip
= to_byte_orig
- to_byte
;
5646 total_skip
= head_skip
+ tail_skip
;
5649 len
-= total_skip
; len_byte
-= total_skip
;
5652 /* For conversion, we must put the gap before the text in addition to
5653 making the gap larger for efficient decoding. The required gap
5654 size starts from 2000 which is the magic number used in make_gap.
5655 But, after one batch of conversion, it will be incremented if we
5656 find that it is not enough . */
5659 if (GAP_SIZE
< require
)
5660 make_gap (require
- GAP_SIZE
);
5661 move_gap_both (from
, from_byte
);
5663 inserted
= inserted_byte
= 0;
5665 GAP_SIZE
+= len_byte
;
5668 ZV_BYTE
-= len_byte
;
5671 if (GPT
- BEG
< BEG_UNCHANGED
)
5672 BEG_UNCHANGED
= GPT
- BEG
;
5673 if (Z
- GPT
< END_UNCHANGED
)
5674 END_UNCHANGED
= Z
- GPT
;
5676 if (!encodep
&& coding
->src_multibyte
)
5678 /* Decoding routines expects that the source text is unibyte.
5679 We must convert 8-bit characters of multibyte form to
5681 int len_byte_orig
= len_byte
;
5682 len_byte
= str_as_unibyte (GAP_END_ADDR
- len_byte
, len_byte
);
5683 if (len_byte
< len_byte_orig
)
5684 safe_bcopy (GAP_END_ADDR
- len_byte_orig
, GAP_END_ADDR
- len_byte
,
5686 coding
->src_multibyte
= 0;
5693 /* The buffer memory is now:
5694 +--------+converted-text+---------+-------original-text-------+---+
5695 |<-from->|<--inserted-->|---------|<--------len_byte--------->|---|
5696 |<---------------------- GAP ----------------------->| */
5697 src
= GAP_END_ADDR
- len_byte
;
5698 dst
= GPT_ADDR
+ inserted_byte
;
5701 result
= encode_coding (coding
, src
, dst
, len_byte
, 0);
5704 if (coding
->composing
!= COMPOSITION_DISABLED
)
5705 coding
->cmp_data
->char_offset
= from
+ inserted
;
5706 result
= decode_coding (coding
, src
, dst
, len_byte
, 0);
5709 /* The buffer memory is now:
5710 +--------+-------converted-text----+--+------original-text----+---+
5711 |<-from->|<-inserted->|<-produced->|--|<-(len_byte-consumed)->|---|
5712 |<---------------------- GAP ----------------------->| */
5714 inserted
+= coding
->produced_char
;
5715 inserted_byte
+= coding
->produced
;
5716 len_byte
-= coding
->consumed
;
5718 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
5720 coding_allocate_composition_data (coding
, from
+ inserted
);
5724 src
+= coding
->consumed
;
5725 dst
+= coding
->produced
;
5727 if (result
== CODING_FINISH_NORMAL
)
5732 if (! encodep
&& result
== CODING_FINISH_INCONSISTENT_EOL
)
5734 unsigned char *pend
= dst
, *p
= pend
- inserted_byte
;
5735 Lisp_Object eol_type
;
5737 /* Encode LFs back to the original eol format (CR or CRLF). */
5738 if (coding
->eol_type
== CODING_EOL_CR
)
5740 while (p
< pend
) if (*p
++ == '\n') p
[-1] = '\r';
5746 while (p
< pend
) if (*p
++ == '\n') count
++;
5747 if (src
- dst
< count
)
5749 /* We don't have sufficient room for encoding LFs
5750 back to CRLF. We must record converted and
5751 not-yet-converted text back to the buffer
5752 content, enlarge the gap, then record them out of
5753 the buffer contents again. */
5754 int add
= len_byte
+ inserted_byte
;
5757 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
5758 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5759 make_gap (count
- GAP_SIZE
);
5761 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
5762 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5763 /* Don't forget to update SRC, DST, and PEND. */
5764 src
= GAP_END_ADDR
- len_byte
;
5765 dst
= GPT_ADDR
+ inserted_byte
;
5769 inserted_byte
+= count
;
5770 coding
->produced
+= count
;
5771 p
= dst
= pend
+ count
;
5775 if (*p
== '\n') count
--, *--p
= '\r';
5779 /* Suppress eol-format conversion in the further conversion. */
5780 coding
->eol_type
= CODING_EOL_LF
;
5782 /* Set the coding system symbol to that for Unix-like EOL. */
5783 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
5784 if (VECTORP (eol_type
)
5785 && XVECTOR (eol_type
)->size
== 3
5786 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
5787 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
5789 coding
->symbol
= saved_coding_symbol
;
5795 if (coding
->type
!= coding_type_ccl
5796 || coding
->mode
& CODING_MODE_LAST_BLOCK
)
5798 coding
->mode
|= CODING_MODE_LAST_BLOCK
;
5801 if (result
== CODING_FINISH_INSUFFICIENT_SRC
)
5803 /* The source text ends in invalid codes. Let's just
5804 make them valid buffer contents, and finish conversion. */
5807 unsigned char *start
= dst
;
5809 inserted
+= len_byte
;
5813 dst
+= CHAR_STRING (c
, dst
);
5816 inserted_byte
+= dst
- start
;
5820 inserted
+= len_byte
;
5821 inserted_byte
+= len_byte
;
5827 if (result
== CODING_FINISH_INTERRUPT
)
5829 /* The conversion procedure was interrupted by a user. */
5832 /* Now RESULT == CODING_FINISH_INSUFFICIENT_DST */
5833 if (coding
->consumed
< 1)
5835 /* It's quite strange to require more memory without
5836 consuming any bytes. Perhaps CCL program bug. */
5841 /* We have just done the first batch of conversion which was
5842 stopped because of insufficient gap. Let's reconsider the
5843 required gap size (i.e. SRT - DST) now.
5845 We have converted ORIG bytes (== coding->consumed) into
5846 NEW bytes (coding->produced). To convert the remaining
5847 LEN bytes, we may need REQUIRE bytes of gap, where:
5848 REQUIRE + LEN_BYTE = LEN_BYTE * (NEW / ORIG)
5849 REQUIRE = LEN_BYTE * (NEW - ORIG) / ORIG
5850 Here, we are sure that NEW >= ORIG. */
5853 if (coding
->produced
<= coding
->consumed
)
5855 /* This happens because of CCL-based coding system with
5861 ratio
= (coding
->produced
- coding
->consumed
) / coding
->consumed
;
5862 require
= len_byte
* ratio
;
5866 if ((src
- dst
) < (require
+ 2000))
5868 /* See the comment above the previous call of make_gap. */
5869 int add
= len_byte
+ inserted_byte
;
5872 ZV
+= add
; Z
+= add
; ZV_BYTE
+= add
; Z_BYTE
+= add
;
5873 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5874 make_gap (require
+ 2000);
5876 ZV
-= add
; Z
-= add
; ZV_BYTE
-= add
; Z_BYTE
-= add
;
5877 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5880 if (src
- dst
> 0) *dst
= 0; /* Put an anchor. */
5882 if (encodep
&& coding
->dst_multibyte
)
5884 /* The output is unibyte. We must convert 8-bit characters to
5886 if (inserted_byte
* 2 > GAP_SIZE
)
5888 GAP_SIZE
-= inserted_byte
;
5889 ZV
+= inserted_byte
; Z
+= inserted_byte
;
5890 ZV_BYTE
+= inserted_byte
; Z_BYTE
+= inserted_byte
;
5891 GPT
+= inserted_byte
; GPT_BYTE
+= inserted_byte
;
5892 make_gap (inserted_byte
- GAP_SIZE
);
5893 GAP_SIZE
+= inserted_byte
;
5894 ZV
-= inserted_byte
; Z
-= inserted_byte
;
5895 ZV_BYTE
-= inserted_byte
; Z_BYTE
-= inserted_byte
;
5896 GPT
-= inserted_byte
; GPT_BYTE
-= inserted_byte
;
5898 inserted_byte
= str_to_multibyte (GPT_ADDR
, GAP_SIZE
, inserted_byte
);
5901 /* If we shrank the conversion area, adjust it now. */
5905 safe_bcopy (GAP_END_ADDR
, GPT_ADDR
+ inserted_byte
, tail_skip
);
5906 inserted
+= total_skip
; inserted_byte
+= total_skip
;
5907 GAP_SIZE
+= total_skip
;
5908 GPT
-= head_skip
; GPT_BYTE
-= head_skip
;
5909 ZV
-= total_skip
; ZV_BYTE
-= total_skip
;
5910 Z
-= total_skip
; Z_BYTE
-= total_skip
;
5911 from
-= head_skip
; from_byte
-= head_skip
;
5912 to
+= tail_skip
; to_byte
+= tail_skip
;
5916 if (! EQ (current_buffer
->undo_list
, Qt
))
5917 adjust_after_replace (from
, from_byte
, deletion
, inserted
, inserted_byte
);
5919 adjust_after_replace_noundo (from
, from_byte
, nchars_del
, nbytes_del
,
5920 inserted
, inserted_byte
);
5921 inserted
= Z
- prev_Z
;
5923 if (!encodep
&& coding
->cmp_data
&& coding
->cmp_data
->used
)
5924 coding_restore_composition (coding
, Fcurrent_buffer ());
5925 coding_free_composition_data (coding
);
5927 if (! inhibit_pre_post_conversion
5928 && ! encodep
&& ! NILP (coding
->post_read_conversion
))
5931 Lisp_Object saved_coding_system
;
5934 TEMP_SET_PT_BOTH (from
, from_byte
);
5936 record_unwind_protect (code_convert_region_unwind
,
5937 Vlast_coding_system_used
);
5938 saved_coding_system
= Vlast_coding_system_used
;
5939 Vlast_coding_system_used
= coding
->symbol
;
5940 /* We should not call any more pre-write/post-read-conversion
5941 functions while this post-read-conversion is running. */
5942 inhibit_pre_post_conversion
= 1;
5943 val
= call1 (coding
->post_read_conversion
, make_number (inserted
));
5944 inhibit_pre_post_conversion
= 0;
5945 coding
->symbol
= Vlast_coding_system_used
;
5946 Vlast_coding_system_used
= saved_coding_system
;
5947 /* Discard the unwind protect. */
5950 inserted
+= Z
- prev_Z
;
5953 if (orig_point
>= from
)
5955 if (orig_point
>= from
+ orig_len
)
5956 orig_point
+= inserted
- orig_len
;
5959 TEMP_SET_PT (orig_point
);
5964 signal_after_change (from
, to
- from
, inserted
);
5965 update_compositions (from
, from
+ inserted
, CHECK_BORDER
);
5969 coding
->consumed
= to_byte
- from_byte
;
5970 coding
->consumed_char
= to
- from
;
5971 coding
->produced
= inserted_byte
;
5972 coding
->produced_char
= inserted
;
5979 run_pre_post_conversion_on_str (str
, coding
, encodep
)
5981 struct coding_system
*coding
;
5984 int count
= SPECPDL_INDEX ();
5985 struct gcpro gcpro1
, gcpro2
;
5986 int multibyte
= STRING_MULTIBYTE (str
);
5989 Lisp_Object old_deactivate_mark
;
5991 record_unwind_protect (Fset_buffer
, Fcurrent_buffer ());
5992 record_unwind_protect (code_convert_region_unwind
,
5993 Vlast_coding_system_used
);
5994 /* It is not crucial to specbind this. */
5995 old_deactivate_mark
= Vdeactivate_mark
;
5996 GCPRO2 (str
, old_deactivate_mark
);
5998 buffer
= Fget_buffer_create (build_string (" *code-converting-work*"));
5999 buf
= XBUFFER (buffer
);
6001 buf
->directory
= current_buffer
->directory
;
6002 buf
->read_only
= Qnil
;
6003 buf
->filename
= Qnil
;
6004 buf
->undo_list
= Qt
;
6005 buf
->overlays_before
= NULL
;
6006 buf
->overlays_after
= NULL
;
6008 set_buffer_internal (buf
);
6009 /* We must insert the contents of STR as is without
6010 unibyte<->multibyte conversion. For that, we adjust the
6011 multibyteness of the working buffer to that of STR. */
6013 buf
->enable_multibyte_characters
= multibyte
? Qt
: Qnil
;
6015 insert_from_string (str
, 0, 0,
6016 SCHARS (str
), SBYTES (str
), 0);
6018 inhibit_pre_post_conversion
= 1;
6020 call2 (coding
->pre_write_conversion
, make_number (BEG
), make_number (Z
));
6023 Vlast_coding_system_used
= coding
->symbol
;
6024 TEMP_SET_PT_BOTH (BEG
, BEG_BYTE
);
6025 call1 (coding
->post_read_conversion
, make_number (Z
- BEG
));
6026 coding
->symbol
= Vlast_coding_system_used
;
6028 inhibit_pre_post_conversion
= 0;
6029 Vdeactivate_mark
= old_deactivate_mark
;
6030 str
= make_buffer_string (BEG
, Z
, 1);
6031 return unbind_to (count
, str
);
6035 decode_coding_string (str
, coding
, nocopy
)
6037 struct coding_system
*coding
;
6041 struct conversion_buffer buf
;
6043 Lisp_Object saved_coding_symbol
;
6045 int require_decoding
;
6046 int shrinked_bytes
= 0;
6048 int consumed
, consumed_char
, produced
, produced_char
;
6051 to_byte
= SBYTES (str
);
6053 saved_coding_symbol
= coding
->symbol
;
6054 coding
->src_multibyte
= STRING_MULTIBYTE (str
);
6055 coding
->dst_multibyte
= 1;
6056 if (CODING_REQUIRE_DETECTION (coding
))
6058 /* See the comments in code_convert_region. */
6059 if (coding
->type
== coding_type_undecided
)
6061 detect_coding (coding
, SDATA (str
), to_byte
);
6062 if (coding
->type
== coding_type_undecided
)
6064 coding
->type
= coding_type_emacs_mule
;
6065 coding
->category_idx
= CODING_CATEGORY_IDX_EMACS_MULE
;
6066 /* As emacs-mule decoder will handle composition, we
6067 need this setting to allocate coding->cmp_data
6069 coding
->composing
= COMPOSITION_NO
;
6072 if (coding
->eol_type
== CODING_EOL_UNDECIDED
6073 && coding
->type
!= coding_type_ccl
)
6075 saved_coding_symbol
= coding
->symbol
;
6076 detect_eol (coding
, SDATA (str
), to_byte
);
6077 if (coding
->eol_type
== CODING_EOL_UNDECIDED
)
6078 coding
->eol_type
= CODING_EOL_LF
;
6079 /* We had better recover the original eol format if we
6080 encounter an inconsistent eol format while decoding. */
6081 coding
->mode
|= CODING_MODE_INHIBIT_INCONSISTENT_EOL
;
6085 if (coding
->type
== coding_type_no_conversion
6086 || coding
->type
== coding_type_raw_text
)
6087 coding
->dst_multibyte
= 0;
6089 require_decoding
= CODING_REQUIRE_DECODING (coding
);
6091 if (STRING_MULTIBYTE (str
))
6093 /* Decoding routines expect the source text to be unibyte. */
6094 str
= Fstring_as_unibyte (str
);
6095 to_byte
= SBYTES (str
);
6097 coding
->src_multibyte
= 0;
6100 /* Try to skip the heading and tailing ASCIIs. */
6101 if (require_decoding
&& coding
->type
!= coding_type_ccl
)
6103 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
6105 if (from
== to_byte
)
6106 require_decoding
= 0;
6107 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
6110 if (!require_decoding
6111 && !(SYMBOLP (coding
->post_read_conversion
)
6112 && !NILP (Ffboundp (coding
->post_read_conversion
))))
6114 coding
->consumed
= SBYTES (str
);
6115 coding
->consumed_char
= SCHARS (str
);
6116 if (coding
->dst_multibyte
)
6118 str
= Fstring_as_multibyte (str
);
6121 coding
->produced
= SBYTES (str
);
6122 coding
->produced_char
= SCHARS (str
);
6123 return (nocopy
? str
: Fcopy_sequence (str
));
6126 if (coding
->composing
!= COMPOSITION_DISABLED
)
6127 coding_allocate_composition_data (coding
, from
);
6128 len
= decoding_buffer_size (coding
, to_byte
- from
);
6129 allocate_conversion_buffer (buf
, len
);
6131 consumed
= consumed_char
= produced
= produced_char
= 0;
6134 result
= decode_coding (coding
, SDATA (str
) + from
+ consumed
,
6135 buf
.data
+ produced
, to_byte
- from
- consumed
,
6136 buf
.size
- produced
);
6137 consumed
+= coding
->consumed
;
6138 consumed_char
+= coding
->consumed_char
;
6139 produced
+= coding
->produced
;
6140 produced_char
+= coding
->produced_char
;
6141 if (result
== CODING_FINISH_NORMAL
6142 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6143 && coding
->consumed
== 0))
6145 if (result
== CODING_FINISH_INSUFFICIENT_CMP
)
6146 coding_allocate_composition_data (coding
, from
+ produced_char
);
6147 else if (result
== CODING_FINISH_INSUFFICIENT_DST
)
6148 extend_conversion_buffer (&buf
);
6149 else if (result
== CODING_FINISH_INCONSISTENT_EOL
)
6151 Lisp_Object eol_type
;
6153 /* Recover the original EOL format. */
6154 if (coding
->eol_type
== CODING_EOL_CR
)
6157 for (p
= buf
.data
; p
< buf
.data
+ produced
; p
++)
6158 if (*p
== '\n') *p
= '\r';
6160 else if (coding
->eol_type
== CODING_EOL_CRLF
)
6163 unsigned char *p0
, *p1
;
6164 for (p0
= buf
.data
, p1
= p0
+ produced
; p0
< p1
; p0
++)
6165 if (*p0
== '\n') num_eol
++;
6166 if (produced
+ num_eol
>= buf
.size
)
6167 extend_conversion_buffer (&buf
);
6168 for (p0
= buf
.data
+ produced
, p1
= p0
+ num_eol
; p0
> buf
.data
;)
6171 if (*p0
== '\n') *--p1
= '\r';
6173 produced
+= num_eol
;
6174 produced_char
+= num_eol
;
6176 /* Suppress eol-format conversion in the further conversion. */
6177 coding
->eol_type
= CODING_EOL_LF
;
6179 /* Set the coding system symbol to that for Unix-like EOL. */
6180 eol_type
= Fget (saved_coding_symbol
, Qeol_type
);
6181 if (VECTORP (eol_type
)
6182 && XVECTOR (eol_type
)->size
== 3
6183 && SYMBOLP (XVECTOR (eol_type
)->contents
[CODING_EOL_LF
]))
6184 coding
->symbol
= XVECTOR (eol_type
)->contents
[CODING_EOL_LF
];
6186 coding
->symbol
= saved_coding_symbol
;
6192 coding
->consumed
= consumed
;
6193 coding
->consumed_char
= consumed_char
;
6194 coding
->produced
= produced
;
6195 coding
->produced_char
= produced_char
;
6197 if (coding
->dst_multibyte
)
6198 newstr
= make_uninit_multibyte_string (produced_char
+ shrinked_bytes
,
6199 produced
+ shrinked_bytes
);
6201 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6203 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6204 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6205 if (shrinked_bytes
> from
)
6206 STRING_COPYIN (newstr
, from
+ produced
,
6207 SDATA (str
) + to_byte
,
6208 shrinked_bytes
- from
);
6209 free_conversion_buffer (&buf
);
6211 if (coding
->cmp_data
&& coding
->cmp_data
->used
)
6212 coding_restore_composition (coding
, newstr
);
6213 coding_free_composition_data (coding
);
6215 if (SYMBOLP (coding
->post_read_conversion
)
6216 && !NILP (Ffboundp (coding
->post_read_conversion
)))
6217 newstr
= run_pre_post_conversion_on_str (newstr
, coding
, 0);
6223 encode_coding_string (str
, coding
, nocopy
)
6225 struct coding_system
*coding
;
6229 struct conversion_buffer buf
;
6230 int from
, to
, to_byte
;
6232 int shrinked_bytes
= 0;
6234 int consumed
, consumed_char
, produced
, produced_char
;
6236 if (SYMBOLP (coding
->pre_write_conversion
)
6237 && !NILP (Ffboundp (coding
->pre_write_conversion
)))
6238 str
= run_pre_post_conversion_on_str (str
, coding
, 1);
6242 to_byte
= SBYTES (str
);
6244 /* Encoding routines determine the multibyteness of the source text
6245 by coding->src_multibyte. */
6246 coding
->src_multibyte
= STRING_MULTIBYTE (str
);
6247 coding
->dst_multibyte
= 0;
6248 if (! CODING_REQUIRE_ENCODING (coding
))
6250 coding
->consumed
= SBYTES (str
);
6251 coding
->consumed_char
= SCHARS (str
);
6252 if (STRING_MULTIBYTE (str
))
6254 str
= Fstring_as_unibyte (str
);
6257 coding
->produced
= SBYTES (str
);
6258 coding
->produced_char
= SCHARS (str
);
6259 return (nocopy
? str
: Fcopy_sequence (str
));
6262 if (coding
->composing
!= COMPOSITION_DISABLED
)
6263 coding_save_composition (coding
, from
, to
, str
);
6265 /* Try to skip the heading and tailing ASCIIs. */
6266 if (coding
->type
!= coding_type_ccl
)
6268 SHRINK_CONVERSION_REGION (&from
, &to_byte
, coding
, SDATA (str
),
6270 if (from
== to_byte
)
6271 return (nocopy
? str
: Fcopy_sequence (str
));
6272 shrinked_bytes
= from
+ (SBYTES (str
) - to_byte
);
6275 len
= encoding_buffer_size (coding
, to_byte
- from
);
6276 allocate_conversion_buffer (buf
, len
);
6278 consumed
= consumed_char
= produced
= produced_char
= 0;
6281 result
= encode_coding (coding
, SDATA (str
) + from
+ consumed
,
6282 buf
.data
+ produced
, to_byte
- from
- consumed
,
6283 buf
.size
- produced
);
6284 consumed
+= coding
->consumed
;
6285 consumed_char
+= coding
->consumed_char
;
6286 produced
+= coding
->produced
;
6287 produced_char
+= coding
->produced_char
;
6288 if (result
== CODING_FINISH_NORMAL
6289 || (result
== CODING_FINISH_INSUFFICIENT_SRC
6290 && coding
->consumed
== 0))
6292 /* Now result should be CODING_FINISH_INSUFFICIENT_DST. */
6293 extend_conversion_buffer (&buf
);
6296 coding
->consumed
= consumed
;
6297 coding
->consumed_char
= consumed_char
;
6298 coding
->produced
= produced
;
6299 coding
->produced_char
= produced_char
;
6301 newstr
= make_uninit_string (produced
+ shrinked_bytes
);
6303 STRING_COPYIN (newstr
, 0, SDATA (str
), from
);
6304 STRING_COPYIN (newstr
, from
, buf
.data
, produced
);
6305 if (shrinked_bytes
> from
)
6306 STRING_COPYIN (newstr
, from
+ produced
,
6307 SDATA (str
) + to_byte
,
6308 shrinked_bytes
- from
);
6310 free_conversion_buffer (&buf
);
6311 coding_free_composition_data (coding
);
6318 /*** 8. Emacs Lisp library functions ***/
6320 DEFUN ("coding-system-p", Fcoding_system_p
, Scoding_system_p
, 1, 1, 0,
6321 doc
: /* Return t if OBJECT is nil or a coding-system.
6322 See the documentation of `make-coding-system' for information
6323 about coding-system objects. */)
6331 /* Get coding-spec vector for OBJ. */
6332 obj
= Fget (obj
, Qcoding_system
);
6333 return ((VECTORP (obj
) && XVECTOR (obj
)->size
== 5)
6337 DEFUN ("read-non-nil-coding-system", Fread_non_nil_coding_system
,
6338 Sread_non_nil_coding_system
, 1, 1, 0,
6339 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT. */)
6346 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6347 Qt
, Qnil
, Qcoding_system_history
, Qnil
, Qnil
);
6349 while (SCHARS (val
) == 0);
6350 return (Fintern (val
, Qnil
));
6353 DEFUN ("read-coding-system", Fread_coding_system
, Sread_coding_system
, 1, 2, 0,
6354 doc
: /* Read a coding system from the minibuffer, prompting with string PROMPT.
6355 If the user enters null input, return second argument DEFAULT-CODING-SYSTEM. */)
6356 (prompt
, default_coding_system
)
6357 Lisp_Object prompt
, default_coding_system
;
6360 if (SYMBOLP (default_coding_system
))
6361 default_coding_system
= SYMBOL_NAME (default_coding_system
);
6362 val
= Fcompleting_read (prompt
, Vcoding_system_alist
, Qnil
,
6363 Qt
, Qnil
, Qcoding_system_history
,
6364 default_coding_system
, Qnil
);
6365 return (SCHARS (val
) == 0 ? Qnil
: Fintern (val
, Qnil
));
6368 DEFUN ("check-coding-system", Fcheck_coding_system
, Scheck_coding_system
,
6370 doc
: /* Check validity of CODING-SYSTEM.
6371 If valid, return CODING-SYSTEM, else signal a `coding-system-error' error.
6372 It is valid if it is a symbol with a non-nil `coding-system' property.
6373 The value of property should be a vector of length 5. */)
6375 Lisp_Object coding_system
;
6377 CHECK_SYMBOL (coding_system
);
6378 if (!NILP (Fcoding_system_p (coding_system
)))
6379 return coding_system
;
6381 Fsignal (Qcoding_system_error
, Fcons (coding_system
, Qnil
));
6385 detect_coding_system (src
, src_bytes
, highest
, multibytep
)
6386 const unsigned char *src
;
6387 int src_bytes
, highest
;
6390 int coding_mask
, eol_type
;
6391 Lisp_Object val
, tmp
;
6394 coding_mask
= detect_coding_mask (src
, src_bytes
, NULL
, &dummy
, multibytep
);
6395 eol_type
= detect_eol_type (src
, src_bytes
, &dummy
);
6396 if (eol_type
== CODING_EOL_INCONSISTENT
)
6397 eol_type
= CODING_EOL_UNDECIDED
;
6402 if (eol_type
!= CODING_EOL_UNDECIDED
)
6405 val2
= Fget (Qundecided
, Qeol_type
);
6407 val
= XVECTOR (val2
)->contents
[eol_type
];
6409 return (highest
? val
: Fcons (val
, Qnil
));
6412 /* At first, gather possible coding systems in VAL. */
6414 for (tmp
= Vcoding_category_list
; CONSP (tmp
); tmp
= XCDR (tmp
))
6416 Lisp_Object category_val
, category_index
;
6418 category_index
= Fget (XCAR (tmp
), Qcoding_category_index
);
6419 category_val
= Fsymbol_value (XCAR (tmp
));
6420 if (!NILP (category_val
)
6421 && NATNUMP (category_index
)
6422 && (coding_mask
& (1 << XFASTINT (category_index
))))
6424 val
= Fcons (category_val
, val
);
6430 val
= Fnreverse (val
);
6432 /* Then, replace the elements with subsidiary coding systems. */
6433 for (tmp
= val
; CONSP (tmp
); tmp
= XCDR (tmp
))
6435 if (eol_type
!= CODING_EOL_UNDECIDED
6436 && eol_type
!= CODING_EOL_INCONSISTENT
)
6439 eol
= Fget (XCAR (tmp
), Qeol_type
);
6441 XSETCAR (tmp
, XVECTOR (eol
)->contents
[eol_type
]);
6444 return (highest
? XCAR (val
) : val
);
6447 DEFUN ("detect-coding-region", Fdetect_coding_region
, Sdetect_coding_region
,
6449 doc
: /* Detect how the byte sequence in the region is encoded.
6450 Return a list of possible coding systems used on decoding a byte
6451 sequence containing the bytes in the region between START and END when
6452 the coding system `undecided' is specified. The list is ordered by
6453 priority decided in the current language environment.
6455 If only ASCII characters are found, it returns a list of single element
6456 `undecided' or its subsidiary coding system according to a detected
6459 If optional argument HIGHEST is non-nil, return the coding system of
6460 highest priority. */)
6461 (start
, end
, highest
)
6462 Lisp_Object start
, end
, highest
;
6465 int from_byte
, to_byte
;
6466 int include_anchor_byte
= 0;
6468 CHECK_NUMBER_COERCE_MARKER (start
);
6469 CHECK_NUMBER_COERCE_MARKER (end
);
6471 validate_region (&start
, &end
);
6472 from
= XINT (start
), to
= XINT (end
);
6473 from_byte
= CHAR_TO_BYTE (from
);
6474 to_byte
= CHAR_TO_BYTE (to
);
6476 if (from
< GPT
&& to
>= GPT
)
6477 move_gap_both (to
, to_byte
);
6478 /* If we an anchor byte `\0' follows the region, we include it in
6479 the detecting source. Then code detectors can handle the tailing
6480 byte sequence more accurately.
6482 Fix me: This is not a perfect solution. It is better that we
6483 add one more argument, say LAST_BLOCK, to all detect_coding_XXX.
6485 if (to
== Z
|| (to
== GPT
&& GAP_SIZE
> 0))
6486 include_anchor_byte
= 1;
6487 return detect_coding_system (BYTE_POS_ADDR (from_byte
),
6488 to_byte
- from_byte
+ include_anchor_byte
,
6490 !NILP (current_buffer
6491 ->enable_multibyte_characters
));
6494 DEFUN ("detect-coding-string", Fdetect_coding_string
, Sdetect_coding_string
,
6496 doc
: /* Detect how the byte sequence in STRING is encoded.
6497 Return a list of possible coding systems used on decoding a byte
6498 sequence containing the bytes in STRING when the coding system
6499 `undecided' is specified. The list is ordered by priority decided in
6500 the current language environment.
6502 If only ASCII characters are found, it returns a list of single element
6503 `undecided' or its subsidiary coding system according to a detected
6506 If optional argument HIGHEST is non-nil, return the coding system of
6507 highest priority. */)
6509 Lisp_Object string
, highest
;
6511 CHECK_STRING (string
);
6513 return detect_coding_system (SDATA (string
),
6514 /* "+ 1" is to include the anchor byte
6515 `\0'. With this, code detectors can
6516 handle the tailing bytes more
6518 SBYTES (string
) + 1,
6520 STRING_MULTIBYTE (string
));
6523 /* Subroutine for Fsafe_coding_systems_region_internal.
6525 Return a list of coding systems that safely encode the multibyte
6526 text between P and PEND. SAFE_CODINGS, if non-nil, is an alist of
6527 possible coding systems. If it is nil, it means that we have not
6528 yet found any coding systems.
6530 WORK_TABLE is a copy of the char-table Vchar_coding_system_table. An
6531 element of WORK_TABLE is set to t once the element is looked up.
6533 If a non-ASCII single byte char is found, set
6534 *single_byte_char_found to 1. */
6537 find_safe_codings (p
, pend
, safe_codings
, work_table
, single_byte_char_found
)
6538 unsigned char *p
, *pend
;
6539 Lisp_Object safe_codings
, work_table
;
6540 int *single_byte_char_found
;
6543 Lisp_Object val
, ch
;
6544 Lisp_Object prev
, tail
;
6548 c
= STRING_CHAR_AND_LENGTH (p
, pend
- p
, len
);
6550 if (ASCII_BYTE_P (c
))
6551 /* We can ignore ASCII characters here. */
6553 if (SINGLE_BYTE_CHAR_P (c
))
6554 *single_byte_char_found
= 1;
6555 if (NILP (safe_codings
))
6556 /* Already all coding systems are excluded. But, we can't
6557 terminate the loop here because non-ASCII single-byte char
6560 /* Check the safe coding systems for C. */
6561 ch
= make_number (c
);
6562 val
= Faref (work_table
, ch
);
6564 /* This element was already checked. Ignore it. */
6566 /* Remember that we checked this element. */
6567 Faset (work_table
, ch
, Qt
);
6569 for (prev
= tail
= safe_codings
; CONSP (tail
); tail
= XCDR (tail
))
6571 Lisp_Object elt
, translation_table
, hash_table
, accept_latin_extra
;
6575 if (CONSP (XCDR (elt
)))
6577 /* This entry has this format now:
6578 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6579 ACCEPT-LATIN-EXTRA ) */
6581 encodable
= ! NILP (Faref (XCAR (val
), ch
));
6585 translation_table
= XCAR (val
);
6586 hash_table
= XCAR (XCDR (val
));
6587 accept_latin_extra
= XCAR (XCDR (XCDR (val
)));
6592 /* This entry has this format now: ( CODING . SAFE-CHARS) */
6593 encodable
= ! NILP (Faref (XCDR (elt
), ch
));
6596 /* Transform the format to:
6597 ( CODING SAFE-CHARS TRANSLATION-TABLE HASH-TABLE
6598 ACCEPT-LATIN-EXTRA ) */
6599 val
= Fget (XCAR (elt
), Qcoding_system
);
6601 = Fplist_get (AREF (val
, 3),
6602 Qtranslation_table_for_encode
);
6603 if (SYMBOLP (translation_table
))
6604 translation_table
= Fget (translation_table
,
6605 Qtranslation_table
);
6607 = (CHAR_TABLE_P (translation_table
)
6608 ? XCHAR_TABLE (translation_table
)->extras
[1]
6611 = ((EQ (AREF (val
, 0), make_number (2))
6612 && VECTORP (AREF (val
, 4)))
6613 ? AREF (AREF (val
, 4), 16)
6615 XSETCAR (tail
, list5 (XCAR (elt
), XCDR (elt
),
6616 translation_table
, hash_table
,
6617 accept_latin_extra
));
6622 && ((CHAR_TABLE_P (translation_table
)
6623 && ! NILP (Faref (translation_table
, ch
)))
6624 || (HASH_TABLE_P (hash_table
)
6625 && ! NILP (Fgethash (ch
, hash_table
, Qnil
)))
6626 || (SINGLE_BYTE_CHAR_P (c
)
6627 && ! NILP (accept_latin_extra
)
6628 && VECTORP (Vlatin_extra_code_table
)
6629 && ! NILP (AREF (Vlatin_extra_code_table
, c
)))))
6635 /* Exclude this coding system from SAFE_CODINGS. */
6636 if (EQ (tail
, safe_codings
))
6637 safe_codings
= XCDR (safe_codings
);
6639 XSETCDR (prev
, XCDR (tail
));
6643 return safe_codings
;
6646 DEFUN ("find-coding-systems-region-internal",
6647 Ffind_coding_systems_region_internal
,
6648 Sfind_coding_systems_region_internal
, 2, 2, 0,
6649 doc
: /* Internal use only. */)
6651 Lisp_Object start
, end
;
6653 Lisp_Object work_table
, safe_codings
;
6654 int non_ascii_p
= 0;
6655 int single_byte_char_found
= 0;
6656 const unsigned char *p1
, *p1end
, *p2
, *p2end
, *p
;
6658 if (STRINGP (start
))
6660 if (!STRING_MULTIBYTE (start
))
6662 p1
= SDATA (start
), p1end
= p1
+ SBYTES (start
);
6664 if (SCHARS (start
) != SBYTES (start
))
6671 CHECK_NUMBER_COERCE_MARKER (start
);
6672 CHECK_NUMBER_COERCE_MARKER (end
);
6673 if (XINT (start
) < BEG
|| XINT (end
) > Z
|| XINT (start
) > XINT (end
))
6674 args_out_of_range (start
, end
);
6675 if (NILP (current_buffer
->enable_multibyte_characters
))
6677 from
= CHAR_TO_BYTE (XINT (start
));
6678 to
= CHAR_TO_BYTE (XINT (end
));
6679 stop
= from
< GPT_BYTE
&& GPT_BYTE
< to
? GPT_BYTE
: to
;
6680 p1
= BYTE_POS_ADDR (from
), p1end
= p1
+ (stop
- from
);
6684 p2
= BYTE_POS_ADDR (stop
), p2end
= p2
+ (to
- stop
);
6685 if (XINT (end
) - XINT (start
) != to
- from
)
6691 /* We are sure that the text contains no multibyte character.
6692 Check if it contains eight-bit-graphic. */
6694 for (p
= p1
; p
< p1end
&& ASCII_BYTE_P (*p
); p
++);
6697 for (p
= p2
; p
< p2end
&& ASCII_BYTE_P (*p
); p
++);
6703 /* The text contains non-ASCII characters. */
6705 work_table
= Fmake_char_table (Qchar_coding_system
, Qnil
);
6706 safe_codings
= Fcopy_sequence (XCDR (Vcoding_system_safe_chars
));
6708 safe_codings
= find_safe_codings (p1
, p1end
, safe_codings
, work_table
,
6709 &single_byte_char_found
);
6711 safe_codings
= find_safe_codings (p2
, p2end
, safe_codings
, work_table
,
6712 &single_byte_char_found
);
6713 if (EQ (safe_codings
, XCDR (Vcoding_system_safe_chars
)))
6717 /* Turn safe_codings to a list of coding systems... */
6720 if (single_byte_char_found
)
6721 /* ... and append these for eight-bit chars. */
6722 val
= Fcons (Qraw_text
,
6723 Fcons (Qemacs_mule
, Fcons (Qno_conversion
, Qnil
)));
6725 /* ... and append generic coding systems. */
6726 val
= Fcopy_sequence (XCAR (Vcoding_system_safe_chars
));
6728 for (; CONSP (safe_codings
); safe_codings
= XCDR (safe_codings
))
6729 val
= Fcons (XCAR (XCAR (safe_codings
)), val
);
6733 return safe_codings
;
6737 /* Search from position POS for such characters that are unencodable
6738 accoding to SAFE_CHARS, and return a list of their positions. P
6739 points where in the memory the character at POS exists. Limit the
6740 search at PEND or when Nth unencodable characters are found.
6742 If SAFE_CHARS is a char table, an element for an unencodable
6745 If SAFE_CHARS is nil, all non-ASCII characters are unencodable.
6747 Otherwise, SAFE_CHARS is t, and only eight-bit-contrl and
6748 eight-bit-graphic characters are unencodable. */
6751 unencodable_char_position (safe_chars
, pos
, p
, pend
, n
)
6752 Lisp_Object safe_chars
;
6754 unsigned char *p
, *pend
;
6757 Lisp_Object pos_list
;
6763 int c
= STRING_CHAR_AND_LENGTH (p
, MAX_MULTIBYTE_LENGTH
, len
);
6766 && (CHAR_TABLE_P (safe_chars
)
6767 ? NILP (CHAR_TABLE_REF (safe_chars
, c
))
6768 : (NILP (safe_chars
) || c
< 256)))
6770 pos_list
= Fcons (make_number (pos
), pos_list
);
6777 return Fnreverse (pos_list
);
6781 DEFUN ("unencodable-char-position", Funencodable_char_position
,
6782 Sunencodable_char_position
, 3, 5, 0,
6784 Return position of first un-encodable character in a region.
6785 START and END specfiy the region and CODING-SYSTEM specifies the
6786 encoding to check. Return nil if CODING-SYSTEM does encode the region.
6788 If optional 4th argument COUNT is non-nil, it specifies at most how
6789 many un-encodable characters to search. In this case, the value is a
6792 If optional 5th argument STRING is non-nil, it is a string to search
6793 for un-encodable characters. In that case, START and END are indexes
6795 (start
, end
, coding_system
, count
, string
)
6796 Lisp_Object start
, end
, coding_system
, count
, string
;
6799 Lisp_Object safe_chars
;
6800 struct coding_system coding
;
6801 Lisp_Object positions
;
6803 unsigned char *p
, *pend
;
6807 validate_region (&start
, &end
);
6808 from
= XINT (start
);
6810 if (NILP (current_buffer
->enable_multibyte_characters
))
6812 p
= CHAR_POS_ADDR (from
);
6816 pend
= CHAR_POS_ADDR (to
);
6820 CHECK_STRING (string
);
6821 CHECK_NATNUM (start
);
6823 from
= XINT (start
);
6826 || to
> SCHARS (string
))
6827 args_out_of_range_3 (string
, start
, end
);
6828 if (! STRING_MULTIBYTE (string
))
6830 p
= SDATA (string
) + string_char_to_byte (string
, from
);
6831 pend
= SDATA (string
) + string_char_to_byte (string
, to
);
6834 setup_coding_system (Fcheck_coding_system (coding_system
), &coding
);
6840 CHECK_NATNUM (count
);
6844 if (coding
.type
== coding_type_no_conversion
6845 || coding
.type
== coding_type_raw_text
)
6848 if (coding
.type
== coding_type_undecided
)
6851 safe_chars
= coding_safe_chars (coding_system
);
6853 if (STRINGP (string
)
6854 || from
>= GPT
|| to
<= GPT
)
6855 positions
= unencodable_char_position (safe_chars
, from
, p
, pend
, n
);
6858 Lisp_Object args
[2];
6860 args
[0] = unencodable_char_position (safe_chars
, from
, p
, GPT_ADDR
, n
);
6861 n
-= XINT (Flength (args
[0]));
6863 positions
= args
[0];
6866 args
[1] = unencodable_char_position (safe_chars
, GPT
, GAP_END_ADDR
,
6868 positions
= Fappend (2, args
);
6872 return (NILP (count
) ? Fcar (positions
) : positions
);
6877 code_convert_region1 (start
, end
, coding_system
, encodep
)
6878 Lisp_Object start
, end
, coding_system
;
6881 struct coding_system coding
;
6884 CHECK_NUMBER_COERCE_MARKER (start
);
6885 CHECK_NUMBER_COERCE_MARKER (end
);
6886 CHECK_SYMBOL (coding_system
);
6888 validate_region (&start
, &end
);
6889 from
= XFASTINT (start
);
6890 to
= XFASTINT (end
);
6892 if (NILP (coding_system
))
6893 return make_number (to
- from
);
6895 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
6896 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
6898 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
6899 coding
.src_multibyte
= coding
.dst_multibyte
6900 = !NILP (current_buffer
->enable_multibyte_characters
);
6901 code_convert_region (from
, CHAR_TO_BYTE (from
), to
, CHAR_TO_BYTE (to
),
6902 &coding
, encodep
, 1);
6903 Vlast_coding_system_used
= coding
.symbol
;
6904 return make_number (coding
.produced_char
);
6907 DEFUN ("decode-coding-region", Fdecode_coding_region
, Sdecode_coding_region
,
6908 3, 3, "r\nzCoding system: ",
6909 doc
: /* Decode the current region from the specified coding system.
6910 When called from a program, takes three arguments:
6911 START, END, and CODING-SYSTEM. START and END are buffer positions.
6912 This function sets `last-coding-system-used' to the precise coding system
6913 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6914 not fully specified.)
6915 It returns the length of the decoded text. */)
6916 (start
, end
, coding_system
)
6917 Lisp_Object start
, end
, coding_system
;
6919 return code_convert_region1 (start
, end
, coding_system
, 0);
6922 DEFUN ("encode-coding-region", Fencode_coding_region
, Sencode_coding_region
,
6923 3, 3, "r\nzCoding system: ",
6924 doc
: /* Encode the current region into the specified coding system.
6925 When called from a program, takes three arguments:
6926 START, END, and CODING-SYSTEM. START and END are buffer positions.
6927 This function sets `last-coding-system-used' to the precise coding system
6928 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6929 not fully specified.)
6930 It returns the length of the encoded text. */)
6931 (start
, end
, coding_system
)
6932 Lisp_Object start
, end
, coding_system
;
6934 return code_convert_region1 (start
, end
, coding_system
, 1);
6938 code_convert_string1 (string
, coding_system
, nocopy
, encodep
)
6939 Lisp_Object string
, coding_system
, nocopy
;
6942 struct coding_system coding
;
6944 CHECK_STRING (string
);
6945 CHECK_SYMBOL (coding_system
);
6947 if (NILP (coding_system
))
6948 return (NILP (nocopy
) ? Fcopy_sequence (string
) : string
);
6950 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
6951 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
6953 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
6955 ? encode_coding_string (string
, &coding
, !NILP (nocopy
))
6956 : decode_coding_string (string
, &coding
, !NILP (nocopy
)));
6957 Vlast_coding_system_used
= coding
.symbol
;
6962 DEFUN ("decode-coding-string", Fdecode_coding_string
, Sdecode_coding_string
,
6964 doc
: /* Decode STRING which is encoded in CODING-SYSTEM, and return the result.
6965 Optional arg NOCOPY non-nil means it is OK to return STRING itself
6966 if the decoding operation is trivial.
6967 This function sets `last-coding-system-used' to the precise coding system
6968 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6969 not fully specified.) */)
6970 (string
, coding_system
, nocopy
)
6971 Lisp_Object string
, coding_system
, nocopy
;
6973 return code_convert_string1 (string
, coding_system
, nocopy
, 0);
6976 DEFUN ("encode-coding-string", Fencode_coding_string
, Sencode_coding_string
,
6978 doc
: /* Encode STRING to CODING-SYSTEM, and return the result.
6979 Optional arg NOCOPY non-nil means it is OK to return STRING itself
6980 if the encoding operation is trivial.
6981 This function sets `last-coding-system-used' to the precise coding system
6982 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
6983 not fully specified.) */)
6984 (string
, coding_system
, nocopy
)
6985 Lisp_Object string
, coding_system
, nocopy
;
6987 return code_convert_string1 (string
, coding_system
, nocopy
, 1);
6990 /* Encode or decode STRING according to CODING_SYSTEM.
6991 Do not set Vlast_coding_system_used.
6993 This function is called only from macros DECODE_FILE and
6994 ENCODE_FILE, thus we ignore character composition. */
6997 code_convert_string_norecord (string
, coding_system
, encodep
)
6998 Lisp_Object string
, coding_system
;
7001 struct coding_system coding
;
7003 CHECK_STRING (string
);
7004 CHECK_SYMBOL (coding_system
);
7006 if (NILP (coding_system
))
7009 if (setup_coding_system (Fcheck_coding_system (coding_system
), &coding
) < 0)
7010 error ("Invalid coding system: %s", SDATA (SYMBOL_NAME (coding_system
)));
7012 coding
.composing
= COMPOSITION_DISABLED
;
7013 coding
.mode
|= CODING_MODE_LAST_BLOCK
;
7015 ? encode_coding_string (string
, &coding
, 1)
7016 : decode_coding_string (string
, &coding
, 1));
7019 DEFUN ("decode-sjis-char", Fdecode_sjis_char
, Sdecode_sjis_char
, 1, 1, 0,
7020 doc
: /* Decode a Japanese character which has CODE in shift_jis encoding.
7021 Return the corresponding character. */)
7025 unsigned char c1
, c2
, s1
, s2
;
7028 CHECK_NUMBER (code
);
7029 s1
= (XFASTINT (code
)) >> 8, s2
= (XFASTINT (code
)) & 0xFF;
7033 XSETFASTINT (val
, s2
);
7034 else if (s2
>= 0xA0 || s2
<= 0xDF)
7035 XSETFASTINT (val
, MAKE_CHAR (charset_katakana_jisx0201
, s2
, 0));
7037 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
7041 if ((s1
< 0x80 || (s1
> 0x9F && s1
< 0xE0) || s1
> 0xEF)
7042 || (s2
< 0x40 || s2
== 0x7F || s2
> 0xFC))
7043 error ("Invalid Shift JIS code: %x", XFASTINT (code
));
7044 DECODE_SJIS (s1
, s2
, c1
, c2
);
7045 XSETFASTINT (val
, MAKE_CHAR (charset_jisx0208
, c1
, c2
));
7050 DEFUN ("encode-sjis-char", Fencode_sjis_char
, Sencode_sjis_char
, 1, 1, 0,
7051 doc
: /* Encode a Japanese character CHAR to shift_jis encoding.
7052 Return the corresponding code in SJIS. */)
7056 int charset
, c1
, c2
, s1
, s2
;
7060 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7061 if (charset
== CHARSET_ASCII
)
7065 else if (charset
== charset_jisx0208
7066 && c1
> 0x20 && c1
< 0x7F && c2
> 0x20 && c2
< 0x7F)
7068 ENCODE_SJIS (c1
, c2
, s1
, s2
);
7069 XSETFASTINT (val
, (s1
<< 8) | s2
);
7071 else if (charset
== charset_katakana_jisx0201
7072 && c1
> 0x20 && c2
< 0xE0)
7074 XSETFASTINT (val
, c1
| 0x80);
7077 error ("Can't encode to shift_jis: %d", XFASTINT (ch
));
7081 DEFUN ("decode-big5-char", Fdecode_big5_char
, Sdecode_big5_char
, 1, 1, 0,
7082 doc
: /* Decode a Big5 character which has CODE in BIG5 coding system.
7083 Return the corresponding character. */)
7088 unsigned char b1
, b2
, c1
, c2
;
7091 CHECK_NUMBER (code
);
7092 b1
= (XFASTINT (code
)) >> 8, b2
= (XFASTINT (code
)) & 0xFF;
7096 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7101 if ((b1
< 0xA1 || b1
> 0xFE)
7102 || (b2
< 0x40 || (b2
> 0x7E && b2
< 0xA1) || b2
> 0xFE))
7103 error ("Invalid BIG5 code: %x", XFASTINT (code
));
7104 DECODE_BIG5 (b1
, b2
, charset
, c1
, c2
);
7105 XSETFASTINT (val
, MAKE_CHAR (charset
, c1
, c2
));
7110 DEFUN ("encode-big5-char", Fencode_big5_char
, Sencode_big5_char
, 1, 1, 0,
7111 doc
: /* Encode the Big5 character CHAR to BIG5 coding system.
7112 Return the corresponding character code in Big5. */)
7116 int charset
, c1
, c2
, b1
, b2
;
7120 SPLIT_CHAR (XFASTINT (ch
), charset
, c1
, c2
);
7121 if (charset
== CHARSET_ASCII
)
7125 else if ((charset
== charset_big5_1
7126 && (XFASTINT (ch
) >= 0x250a1 && XFASTINT (ch
) <= 0x271ec))
7127 || (charset
== charset_big5_2
7128 && XFASTINT (ch
) >= 0x290a1 && XFASTINT (ch
) <= 0x2bdb2))
7130 ENCODE_BIG5 (charset
, c1
, c2
, b1
, b2
);
7131 XSETFASTINT (val
, (b1
<< 8) | b2
);
7134 error ("Can't encode to Big5: %d", XFASTINT (ch
));
7138 DEFUN ("set-terminal-coding-system-internal", Fset_terminal_coding_system_internal
,
7139 Sset_terminal_coding_system_internal
, 1, 1, 0,
7140 doc
: /* Internal use only. */)
7142 Lisp_Object coding_system
;
7144 CHECK_SYMBOL (coding_system
);
7145 setup_coding_system (Fcheck_coding_system (coding_system
), &terminal_coding
);
7146 /* We had better not send unsafe characters to terminal. */
7147 terminal_coding
.mode
|= CODING_MODE_INHIBIT_UNENCODABLE_CHAR
;
7148 /* Character composition should be disabled. */
7149 terminal_coding
.composing
= COMPOSITION_DISABLED
;
7150 /* Error notification should be suppressed. */
7151 terminal_coding
.suppress_error
= 1;
7152 terminal_coding
.src_multibyte
= 1;
7153 terminal_coding
.dst_multibyte
= 0;
7157 DEFUN ("set-safe-terminal-coding-system-internal", Fset_safe_terminal_coding_system_internal
,
7158 Sset_safe_terminal_coding_system_internal
, 1, 1, 0,
7159 doc
: /* Internal use only. */)
7161 Lisp_Object coding_system
;
7163 CHECK_SYMBOL (coding_system
);
7164 setup_coding_system (Fcheck_coding_system (coding_system
),
7165 &safe_terminal_coding
);
7166 /* Character composition should be disabled. */
7167 safe_terminal_coding
.composing
= COMPOSITION_DISABLED
;
7168 /* Error notification should be suppressed. */
7169 terminal_coding
.suppress_error
= 1;
7170 safe_terminal_coding
.src_multibyte
= 1;
7171 safe_terminal_coding
.dst_multibyte
= 0;
7175 DEFUN ("terminal-coding-system", Fterminal_coding_system
,
7176 Sterminal_coding_system
, 0, 0, 0,
7177 doc
: /* Return coding system specified for terminal output. */)
7180 return terminal_coding
.symbol
;
7183 DEFUN ("set-keyboard-coding-system-internal", Fset_keyboard_coding_system_internal
,
7184 Sset_keyboard_coding_system_internal
, 1, 1, 0,
7185 doc
: /* Internal use only. */)
7187 Lisp_Object coding_system
;
7189 CHECK_SYMBOL (coding_system
);
7190 setup_coding_system (Fcheck_coding_system (coding_system
), &keyboard_coding
);
7191 /* Character composition should be disabled. */
7192 keyboard_coding
.composing
= COMPOSITION_DISABLED
;
7196 DEFUN ("keyboard-coding-system", Fkeyboard_coding_system
,
7197 Skeyboard_coding_system
, 0, 0, 0,
7198 doc
: /* Return coding system specified for decoding keyboard input. */)
7201 return keyboard_coding
.symbol
;
7205 DEFUN ("find-operation-coding-system", Ffind_operation_coding_system
,
7206 Sfind_operation_coding_system
, 1, MANY
, 0,
7207 doc
: /* Choose a coding system for an operation based on the target name.
7208 The value names a pair of coding systems: (DECODING-SYSTEM . ENCODING-SYSTEM).
7209 DECODING-SYSTEM is the coding system to use for decoding
7210 \(in case OPERATION does decoding), and ENCODING-SYSTEM is the coding system
7211 for encoding (in case OPERATION does encoding).
7213 The first argument OPERATION specifies an I/O primitive:
7214 For file I/O, `insert-file-contents' or `write-region'.
7215 For process I/O, `call-process', `call-process-region', or `start-process'.
7216 For network I/O, `open-network-stream'.
7218 The remaining arguments should be the same arguments that were passed
7219 to the primitive. Depending on which primitive, one of those arguments
7220 is selected as the TARGET. For example, if OPERATION does file I/O,
7221 whichever argument specifies the file name is TARGET.
7223 TARGET has a meaning which depends on OPERATION:
7224 For file I/O, TARGET is a file name.
7225 For process I/O, TARGET is a process name.
7226 For network I/O, TARGET is a service name or a port number
7228 This function looks up what specified for TARGET in,
7229 `file-coding-system-alist', `process-coding-system-alist',
7230 or `network-coding-system-alist' depending on OPERATION.
7231 They may specify a coding system, a cons of coding systems,
7232 or a function symbol to call.
7233 In the last case, we call the function with one argument,
7234 which is a list of all the arguments given to this function.
7236 usage: (find-operation-coding-system OPERATION ARGUMENTS ...) */)
7241 Lisp_Object operation
, target_idx
, target
, val
;
7242 register Lisp_Object chain
;
7245 error ("Too few arguments");
7246 operation
= args
[0];
7247 if (!SYMBOLP (operation
)
7248 || !INTEGERP (target_idx
= Fget (operation
, Qtarget_idx
)))
7249 error ("Invalid first argument");
7250 if (nargs
< 1 + XINT (target_idx
))
7251 error ("Too few arguments for operation: %s",
7252 SDATA (SYMBOL_NAME (operation
)));
7253 /* For write-region, if the 6th argument (i.e. VISIT, the 5th
7254 argument to write-region) is string, it must be treated as a
7255 target file name. */
7256 if (EQ (operation
, Qwrite_region
)
7258 && STRINGP (args
[5]))
7259 target_idx
= make_number (4);
7260 target
= args
[XINT (target_idx
) + 1];
7261 if (!(STRINGP (target
)
7262 || (EQ (operation
, Qopen_network_stream
) && INTEGERP (target
))))
7263 error ("Invalid argument %d", XINT (target_idx
) + 1);
7265 chain
= ((EQ (operation
, Qinsert_file_contents
)
7266 || EQ (operation
, Qwrite_region
))
7267 ? Vfile_coding_system_alist
7268 : (EQ (operation
, Qopen_network_stream
)
7269 ? Vnetwork_coding_system_alist
7270 : Vprocess_coding_system_alist
));
7274 for (; CONSP (chain
); chain
= XCDR (chain
))
7280 && ((STRINGP (target
)
7281 && STRINGP (XCAR (elt
))
7282 && fast_string_match (XCAR (elt
), target
) >= 0)
7283 || (INTEGERP (target
) && EQ (target
, XCAR (elt
)))))
7286 /* Here, if VAL is both a valid coding system and a valid
7287 function symbol, we return VAL as a coding system. */
7290 if (! SYMBOLP (val
))
7292 if (! NILP (Fcoding_system_p (val
)))
7293 return Fcons (val
, val
);
7294 if (! NILP (Ffboundp (val
)))
7296 val
= call1 (val
, Flist (nargs
, args
));
7299 if (SYMBOLP (val
) && ! NILP (Fcoding_system_p (val
)))
7300 return Fcons (val
, val
);
7308 DEFUN ("update-coding-systems-internal", Fupdate_coding_systems_internal
,
7309 Supdate_coding_systems_internal
, 0, 0, 0,
7310 doc
: /* Update internal database for ISO2022 and CCL based coding systems.
7311 When values of any coding categories are changed, you must
7312 call this function. */)
7317 for (i
= CODING_CATEGORY_IDX_EMACS_MULE
; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7321 val
= SYMBOL_VALUE (XVECTOR (Vcoding_category_table
)->contents
[i
]);
7324 if (! coding_system_table
[i
])
7325 coding_system_table
[i
] = ((struct coding_system
*)
7326 xmalloc (sizeof (struct coding_system
)));
7327 setup_coding_system (val
, coding_system_table
[i
]);
7329 else if (coding_system_table
[i
])
7331 xfree (coding_system_table
[i
]);
7332 coding_system_table
[i
] = NULL
;
7339 DEFUN ("set-coding-priority-internal", Fset_coding_priority_internal
,
7340 Sset_coding_priority_internal
, 0, 0, 0,
7341 doc
: /* Update internal database for the current value of `coding-category-list'.
7342 This function is internal use only. */)
7348 val
= Vcoding_category_list
;
7350 while (CONSP (val
) && i
< CODING_CATEGORY_IDX_MAX
)
7352 if (! SYMBOLP (XCAR (val
)))
7354 idx
= XFASTINT (Fget (XCAR (val
), Qcoding_category_index
));
7355 if (idx
>= CODING_CATEGORY_IDX_MAX
)
7357 coding_priorities
[i
++] = (1 << idx
);
7360 /* If coding-category-list is valid and contains all coding
7361 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
7362 the following code saves Emacs from crashing. */
7363 while (i
< CODING_CATEGORY_IDX_MAX
)
7364 coding_priorities
[i
++] = CODING_CATEGORY_MASK_RAW_TEXT
;
7369 DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal
,
7370 Sdefine_coding_system_internal
, 1, 1, 0,
7371 doc
: /* Register CODING-SYSTEM as a base coding system.
7372 This function is internal use only. */)
7374 Lisp_Object coding_system
;
7376 Lisp_Object safe_chars
, slot
;
7378 if (NILP (Fcheck_coding_system (coding_system
)))
7379 Fsignal (Qcoding_system_error
, Fcons (coding_system
, Qnil
));
7380 safe_chars
= coding_safe_chars (coding_system
);
7381 if (! EQ (safe_chars
, Qt
) && ! CHAR_TABLE_P (safe_chars
))
7382 error ("No valid safe-chars property for %s",
7383 SDATA (SYMBOL_NAME (coding_system
)));
7384 if (EQ (safe_chars
, Qt
))
7386 if (NILP (Fmemq (coding_system
, XCAR (Vcoding_system_safe_chars
))))
7387 XSETCAR (Vcoding_system_safe_chars
,
7388 Fcons (coding_system
, XCAR (Vcoding_system_safe_chars
)));
7392 slot
= Fassq (coding_system
, XCDR (Vcoding_system_safe_chars
));
7394 XSETCDR (Vcoding_system_safe_chars
,
7395 nconc2 (XCDR (Vcoding_system_safe_chars
),
7396 Fcons (Fcons (coding_system
, safe_chars
), Qnil
)));
7398 XSETCDR (slot
, safe_chars
);
7406 /*** 9. Post-amble ***/
7413 /* Emacs' internal format specific initialize routine. */
7414 for (i
= 0; i
<= 0x20; i
++)
7415 emacs_code_class
[i
] = EMACS_control_code
;
7416 emacs_code_class
[0x0A] = EMACS_linefeed_code
;
7417 emacs_code_class
[0x0D] = EMACS_carriage_return_code
;
7418 for (i
= 0x21 ; i
< 0x7F; i
++)
7419 emacs_code_class
[i
] = EMACS_ascii_code
;
7420 emacs_code_class
[0x7F] = EMACS_control_code
;
7421 for (i
= 0x80; i
< 0xFF; i
++)
7422 emacs_code_class
[i
] = EMACS_invalid_code
;
7423 emacs_code_class
[LEADING_CODE_PRIVATE_11
] = EMACS_leading_code_3
;
7424 emacs_code_class
[LEADING_CODE_PRIVATE_12
] = EMACS_leading_code_3
;
7425 emacs_code_class
[LEADING_CODE_PRIVATE_21
] = EMACS_leading_code_4
;
7426 emacs_code_class
[LEADING_CODE_PRIVATE_22
] = EMACS_leading_code_4
;
7428 /* ISO2022 specific initialize routine. */
7429 for (i
= 0; i
< 0x20; i
++)
7430 iso_code_class
[i
] = ISO_control_0
;
7431 for (i
= 0x21; i
< 0x7F; i
++)
7432 iso_code_class
[i
] = ISO_graphic_plane_0
;
7433 for (i
= 0x80; i
< 0xA0; i
++)
7434 iso_code_class
[i
] = ISO_control_1
;
7435 for (i
= 0xA1; i
< 0xFF; i
++)
7436 iso_code_class
[i
] = ISO_graphic_plane_1
;
7437 iso_code_class
[0x20] = iso_code_class
[0x7F] = ISO_0x20_or_0x7F
;
7438 iso_code_class
[0xA0] = iso_code_class
[0xFF] = ISO_0xA0_or_0xFF
;
7439 iso_code_class
[ISO_CODE_CR
] = ISO_carriage_return
;
7440 iso_code_class
[ISO_CODE_SO
] = ISO_shift_out
;
7441 iso_code_class
[ISO_CODE_SI
] = ISO_shift_in
;
7442 iso_code_class
[ISO_CODE_SS2_7
] = ISO_single_shift_2_7
;
7443 iso_code_class
[ISO_CODE_ESC
] = ISO_escape
;
7444 iso_code_class
[ISO_CODE_SS2
] = ISO_single_shift_2
;
7445 iso_code_class
[ISO_CODE_SS3
] = ISO_single_shift_3
;
7446 iso_code_class
[ISO_CODE_CSI
] = ISO_control_sequence_introducer
;
7448 setup_coding_system (Qnil
, &keyboard_coding
);
7449 setup_coding_system (Qnil
, &terminal_coding
);
7450 setup_coding_system (Qnil
, &safe_terminal_coding
);
7451 setup_coding_system (Qnil
, &default_buffer_file_coding
);
7453 bzero (coding_system_table
, sizeof coding_system_table
);
7455 bzero (ascii_skip_code
, sizeof ascii_skip_code
);
7456 for (i
= 0; i
< 128; i
++)
7457 ascii_skip_code
[i
] = 1;
7459 #if defined (MSDOS) || defined (WINDOWSNT)
7460 system_eol_type
= CODING_EOL_CRLF
;
7462 system_eol_type
= CODING_EOL_LF
;
7465 inhibit_pre_post_conversion
= 0;
7473 Qtarget_idx
= intern ("target-idx");
7474 staticpro (&Qtarget_idx
);
7476 Qcoding_system_history
= intern ("coding-system-history");
7477 staticpro (&Qcoding_system_history
);
7478 Fset (Qcoding_system_history
, Qnil
);
7480 /* Target FILENAME is the first argument. */
7481 Fput (Qinsert_file_contents
, Qtarget_idx
, make_number (0));
7482 /* Target FILENAME is the third argument. */
7483 Fput (Qwrite_region
, Qtarget_idx
, make_number (2));
7485 Qcall_process
= intern ("call-process");
7486 staticpro (&Qcall_process
);
7487 /* Target PROGRAM is the first argument. */
7488 Fput (Qcall_process
, Qtarget_idx
, make_number (0));
7490 Qcall_process_region
= intern ("call-process-region");
7491 staticpro (&Qcall_process_region
);
7492 /* Target PROGRAM is the third argument. */
7493 Fput (Qcall_process_region
, Qtarget_idx
, make_number (2));
7495 Qstart_process
= intern ("start-process");
7496 staticpro (&Qstart_process
);
7497 /* Target PROGRAM is the third argument. */
7498 Fput (Qstart_process
, Qtarget_idx
, make_number (2));
7500 Qopen_network_stream
= intern ("open-network-stream");
7501 staticpro (&Qopen_network_stream
);
7502 /* Target SERVICE is the fourth argument. */
7503 Fput (Qopen_network_stream
, Qtarget_idx
, make_number (3));
7505 Qcoding_system
= intern ("coding-system");
7506 staticpro (&Qcoding_system
);
7508 Qeol_type
= intern ("eol-type");
7509 staticpro (&Qeol_type
);
7511 Qbuffer_file_coding_system
= intern ("buffer-file-coding-system");
7512 staticpro (&Qbuffer_file_coding_system
);
7514 Qpost_read_conversion
= intern ("post-read-conversion");
7515 staticpro (&Qpost_read_conversion
);
7517 Qpre_write_conversion
= intern ("pre-write-conversion");
7518 staticpro (&Qpre_write_conversion
);
7520 Qno_conversion
= intern ("no-conversion");
7521 staticpro (&Qno_conversion
);
7523 Qundecided
= intern ("undecided");
7524 staticpro (&Qundecided
);
7526 Qcoding_system_p
= intern ("coding-system-p");
7527 staticpro (&Qcoding_system_p
);
7529 Qcoding_system_error
= intern ("coding-system-error");
7530 staticpro (&Qcoding_system_error
);
7532 Fput (Qcoding_system_error
, Qerror_conditions
,
7533 Fcons (Qcoding_system_error
, Fcons (Qerror
, Qnil
)));
7534 Fput (Qcoding_system_error
, Qerror_message
,
7535 build_string ("Invalid coding system"));
7537 Qcoding_category
= intern ("coding-category");
7538 staticpro (&Qcoding_category
);
7539 Qcoding_category_index
= intern ("coding-category-index");
7540 staticpro (&Qcoding_category_index
);
7542 Vcoding_category_table
7543 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX
), Qnil
);
7544 staticpro (&Vcoding_category_table
);
7547 for (i
= 0; i
< CODING_CATEGORY_IDX_MAX
; i
++)
7549 XVECTOR (Vcoding_category_table
)->contents
[i
]
7550 = intern (coding_category_name
[i
]);
7551 Fput (XVECTOR (Vcoding_category_table
)->contents
[i
],
7552 Qcoding_category_index
, make_number (i
));
7556 Vcoding_system_safe_chars
= Fcons (Qnil
, Qnil
);
7557 staticpro (&Vcoding_system_safe_chars
);
7559 Qtranslation_table
= intern ("translation-table");
7560 staticpro (&Qtranslation_table
);
7561 Fput (Qtranslation_table
, Qchar_table_extra_slots
, make_number (2));
7563 Qtranslation_table_id
= intern ("translation-table-id");
7564 staticpro (&Qtranslation_table_id
);
7566 Qtranslation_table_for_decode
= intern ("translation-table-for-decode");
7567 staticpro (&Qtranslation_table_for_decode
);
7569 Qtranslation_table_for_encode
= intern ("translation-table-for-encode");
7570 staticpro (&Qtranslation_table_for_encode
);
7572 Qsafe_chars
= intern ("safe-chars");
7573 staticpro (&Qsafe_chars
);
7575 Qchar_coding_system
= intern ("char-coding-system");
7576 staticpro (&Qchar_coding_system
);
7578 /* Intern this now in case it isn't already done.
7579 Setting this variable twice is harmless.
7580 But don't staticpro it here--that is done in alloc.c. */
7581 Qchar_table_extra_slots
= intern ("char-table-extra-slots");
7582 Fput (Qsafe_chars
, Qchar_table_extra_slots
, make_number (0));
7583 Fput (Qchar_coding_system
, Qchar_table_extra_slots
, make_number (0));
7585 Qvalid_codes
= intern ("valid-codes");
7586 staticpro (&Qvalid_codes
);
7588 Qemacs_mule
= intern ("emacs-mule");
7589 staticpro (&Qemacs_mule
);
7591 Qraw_text
= intern ("raw-text");
7592 staticpro (&Qraw_text
);
7594 Qutf_8
= intern ("utf-8");
7595 staticpro (&Qutf_8
);
7597 defsubr (&Scoding_system_p
);
7598 defsubr (&Sread_coding_system
);
7599 defsubr (&Sread_non_nil_coding_system
);
7600 defsubr (&Scheck_coding_system
);
7601 defsubr (&Sdetect_coding_region
);
7602 defsubr (&Sdetect_coding_string
);
7603 defsubr (&Sfind_coding_systems_region_internal
);
7604 defsubr (&Sunencodable_char_position
);
7605 defsubr (&Sdecode_coding_region
);
7606 defsubr (&Sencode_coding_region
);
7607 defsubr (&Sdecode_coding_string
);
7608 defsubr (&Sencode_coding_string
);
7609 defsubr (&Sdecode_sjis_char
);
7610 defsubr (&Sencode_sjis_char
);
7611 defsubr (&Sdecode_big5_char
);
7612 defsubr (&Sencode_big5_char
);
7613 defsubr (&Sset_terminal_coding_system_internal
);
7614 defsubr (&Sset_safe_terminal_coding_system_internal
);
7615 defsubr (&Sterminal_coding_system
);
7616 defsubr (&Sset_keyboard_coding_system_internal
);
7617 defsubr (&Skeyboard_coding_system
);
7618 defsubr (&Sfind_operation_coding_system
);
7619 defsubr (&Supdate_coding_systems_internal
);
7620 defsubr (&Sset_coding_priority_internal
);
7621 defsubr (&Sdefine_coding_system_internal
);
7623 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list
,
7624 doc
: /* List of coding systems.
7626 Do not alter the value of this variable manually. This variable should be
7627 updated by the functions `make-coding-system' and
7628 `define-coding-system-alias'. */);
7629 Vcoding_system_list
= Qnil
;
7631 DEFVAR_LISP ("coding-system-alist", &Vcoding_system_alist
,
7632 doc
: /* Alist of coding system names.
7633 Each element is one element list of coding system name.
7634 This variable is given to `completing-read' as TABLE argument.
7636 Do not alter the value of this variable manually. This variable should be
7637 updated by the functions `make-coding-system' and
7638 `define-coding-system-alias'. */);
7639 Vcoding_system_alist
= Qnil
;
7641 DEFVAR_LISP ("coding-category-list", &Vcoding_category_list
,
7642 doc
: /* List of coding-categories (symbols) ordered by priority.
7644 On detecting a coding system, Emacs tries code detection algorithms
7645 associated with each coding-category one by one in this order. When
7646 one algorithm agrees with a byte sequence of source text, the coding
7647 system bound to the corresponding coding-category is selected. */);
7651 Vcoding_category_list
= Qnil
;
7652 for (i
= CODING_CATEGORY_IDX_MAX
- 1; i
>= 0; i
--)
7653 Vcoding_category_list
7654 = Fcons (XVECTOR (Vcoding_category_table
)->contents
[i
],
7655 Vcoding_category_list
);
7658 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read
,
7659 doc
: /* Specify the coding system for read operations.
7660 It is useful to bind this variable with `let', but do not set it globally.
7661 If the value is a coding system, it is used for decoding on read operation.
7662 If not, an appropriate element is used from one of the coding system alists:
7663 There are three such tables, `file-coding-system-alist',
7664 `process-coding-system-alist', and `network-coding-system-alist'. */);
7665 Vcoding_system_for_read
= Qnil
;
7667 DEFVAR_LISP ("coding-system-for-write", &Vcoding_system_for_write
,
7668 doc
: /* Specify the coding system for write operations.
7669 Programs bind this variable with `let', but you should not set it globally.
7670 If the value is a coding system, it is used for encoding of output,
7671 when writing it to a file and when sending it to a file or subprocess.
7673 If this does not specify a coding system, an appropriate element
7674 is used from one of the coding system alists:
7675 There are three such tables, `file-coding-system-alist',
7676 `process-coding-system-alist', and `network-coding-system-alist'.
7677 For output to files, if the above procedure does not specify a coding system,
7678 the value of `buffer-file-coding-system' is used. */);
7679 Vcoding_system_for_write
= Qnil
;
7681 DEFVAR_LISP ("last-coding-system-used", &Vlast_coding_system_used
,
7682 doc
: /* Coding system used in the latest file or process I/O.
7683 Also set by `encode-coding-region', `decode-coding-region',
7684 `encode-coding-string' and `decode-coding-string'. */);
7685 Vlast_coding_system_used
= Qnil
;
7687 DEFVAR_BOOL ("inhibit-eol-conversion", &inhibit_eol_conversion
,
7688 doc
: /* *Non-nil means always inhibit code conversion of end-of-line format.
7689 See info node `Coding Systems' and info node `Text and Binary' concerning
7690 such conversion. */);
7691 inhibit_eol_conversion
= 0;
7693 DEFVAR_BOOL ("inherit-process-coding-system", &inherit_process_coding_system
,
7694 doc
: /* Non-nil means process buffer inherits coding system of process output.
7695 Bind it to t if the process output is to be treated as if it were a file
7696 read from some filesystem. */);
7697 inherit_process_coding_system
= 0;
7699 DEFVAR_LISP ("file-coding-system-alist", &Vfile_coding_system_alist
,
7700 doc
: /* Alist to decide a coding system to use for a file I/O operation.
7701 The format is ((PATTERN . VAL) ...),
7702 where PATTERN is a regular expression matching a file name,
7703 VAL is a coding system, a cons of coding systems, or a function symbol.
7704 If VAL is a coding system, it is used for both decoding and encoding
7706 If VAL is a cons of coding systems, the car part is used for decoding,
7707 and the cdr part is used for encoding.
7708 If VAL is a function symbol, the function must return a coding system
7709 or a cons of coding systems which are used as above. The function gets
7710 the arguments with which `find-operation-coding-system' was called.
7712 See also the function `find-operation-coding-system'
7713 and the variable `auto-coding-alist'. */);
7714 Vfile_coding_system_alist
= Qnil
;
7716 DEFVAR_LISP ("process-coding-system-alist", &Vprocess_coding_system_alist
,
7717 doc
: /* Alist to decide a coding system to use for a process I/O operation.
7718 The format is ((PATTERN . VAL) ...),
7719 where PATTERN is a regular expression matching a program name,
7720 VAL is a coding system, a cons of coding systems, or a function symbol.
7721 If VAL is a coding system, it is used for both decoding what received
7722 from the program and encoding what sent to the program.
7723 If VAL is a cons of coding systems, the car part is used for decoding,
7724 and the cdr part is used for encoding.
7725 If VAL is a function symbol, the function must return a coding system
7726 or a cons of coding systems which are used as above.
7728 See also the function `find-operation-coding-system'. */);
7729 Vprocess_coding_system_alist
= Qnil
;
7731 DEFVAR_LISP ("network-coding-system-alist", &Vnetwork_coding_system_alist
,
7732 doc
: /* Alist to decide a coding system to use for a network I/O operation.
7733 The format is ((PATTERN . VAL) ...),
7734 where PATTERN is a regular expression matching a network service name
7735 or is a port number to connect to,
7736 VAL is a coding system, a cons of coding systems, or a function symbol.
7737 If VAL is a coding system, it is used for both decoding what received
7738 from the network stream and encoding what sent to the network stream.
7739 If VAL is a cons of coding systems, the car part is used for decoding,
7740 and the cdr part is used for encoding.
7741 If VAL is a function symbol, the function must return a coding system
7742 or a cons of coding systems which are used as above.
7744 See also the function `find-operation-coding-system'. */);
7745 Vnetwork_coding_system_alist
= Qnil
;
7747 DEFVAR_LISP ("locale-coding-system", &Vlocale_coding_system
,
7748 doc
: /* Coding system to use with system messages.
7749 Also used for decoding keyboard input on X Window system. */);
7750 Vlocale_coding_system
= Qnil
;
7752 /* The eol mnemonics are reset in startup.el system-dependently. */
7753 DEFVAR_LISP ("eol-mnemonic-unix", &eol_mnemonic_unix
,
7754 doc
: /* *String displayed in mode line for UNIX-like (LF) end-of-line format. */);
7755 eol_mnemonic_unix
= build_string (":");
7757 DEFVAR_LISP ("eol-mnemonic-dos", &eol_mnemonic_dos
,
7758 doc
: /* *String displayed in mode line for DOS-like (CRLF) end-of-line format. */);
7759 eol_mnemonic_dos
= build_string ("\\");
7761 DEFVAR_LISP ("eol-mnemonic-mac", &eol_mnemonic_mac
,
7762 doc
: /* *String displayed in mode line for MAC-like (CR) end-of-line format. */);
7763 eol_mnemonic_mac
= build_string ("/");
7765 DEFVAR_LISP ("eol-mnemonic-undecided", &eol_mnemonic_undecided
,
7766 doc
: /* *String displayed in mode line when end-of-line format is not yet determined. */);
7767 eol_mnemonic_undecided
= build_string (":");
7769 DEFVAR_LISP ("enable-character-translation", &Venable_character_translation
,
7770 doc
: /* *Non-nil enables character translation while encoding and decoding. */);
7771 Venable_character_translation
= Qt
;
7773 DEFVAR_LISP ("standard-translation-table-for-decode",
7774 &Vstandard_translation_table_for_decode
,
7775 doc
: /* Table for translating characters while decoding. */);
7776 Vstandard_translation_table_for_decode
= Qnil
;
7778 DEFVAR_LISP ("standard-translation-table-for-encode",
7779 &Vstandard_translation_table_for_encode
,
7780 doc
: /* Table for translating characters while encoding. */);
7781 Vstandard_translation_table_for_encode
= Qnil
;
7783 DEFVAR_LISP ("charset-revision-table", &Vcharset_revision_alist
,
7784 doc
: /* Alist of charsets vs revision numbers.
7785 While encoding, if a charset (car part of an element) is found,
7786 designate it with the escape sequence identifying revision (cdr part of the element). */);
7787 Vcharset_revision_alist
= Qnil
;
7789 DEFVAR_LISP ("default-process-coding-system",
7790 &Vdefault_process_coding_system
,
7791 doc
: /* Cons of coding systems used for process I/O by default.
7792 The car part is used for decoding a process output,
7793 the cdr part is used for encoding a text to be sent to a process. */);
7794 Vdefault_process_coding_system
= Qnil
;
7796 DEFVAR_LISP ("latin-extra-code-table", &Vlatin_extra_code_table
,
7797 doc
: /* Table of extra Latin codes in the range 128..159 (inclusive).
7798 This is a vector of length 256.
7799 If Nth element is non-nil, the existence of code N in a file
7800 \(or output of subprocess) doesn't prevent it to be detected as
7801 a coding system of ISO 2022 variant which has a flag
7802 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file
7803 or reading output of a subprocess.
7804 Only 128th through 159th elements has a meaning. */);
7805 Vlatin_extra_code_table
= Fmake_vector (make_number (256), Qnil
);
7807 DEFVAR_LISP ("select-safe-coding-system-function",
7808 &Vselect_safe_coding_system_function
,
7809 doc
: /* Function to call to select safe coding system for encoding a text.
7811 If set, this function is called to force a user to select a proper
7812 coding system which can encode the text in the case that a default
7813 coding system used in each operation can't encode the text.
7815 The default value is `select-safe-coding-system' (which see). */);
7816 Vselect_safe_coding_system_function
= Qnil
;
7818 DEFVAR_BOOL ("coding-system-require-warning",
7819 &coding_system_require_warning
,
7820 doc
: /* Internal use only.
7821 If non-nil, on writing a file, `select-safe-coding-system-function' is
7822 called even if `coding-system-for-write' is non-nil. The command
7823 `universal-coding-system-argument' binds this variable to t temporarily. */);
7824 coding_system_require_warning
= 0;
7827 DEFVAR_BOOL ("inhibit-iso-escape-detection",
7828 &inhibit_iso_escape_detection
,
7829 doc
: /* If non-nil, Emacs ignores ISO2022's escape sequence on code detection.
7831 By default, on reading a file, Emacs tries to detect how the text is
7832 encoded. This code detection is sensitive to escape sequences. If
7833 the sequence is valid as ISO2022, the code is determined as one of
7834 the ISO2022 encodings, and the file is decoded by the corresponding
7835 coding system (e.g. `iso-2022-7bit').
7837 However, there may be a case that you want to read escape sequences in
7838 a file as is. In such a case, you can set this variable to non-nil.
7839 Then, as the code detection ignores any escape sequences, no file is
7840 detected as encoded in some ISO2022 encoding. The result is that all
7841 escape sequences become visible in a buffer.
7843 The default value is nil, and it is strongly recommended not to change
7844 it. That is because many Emacs Lisp source files that contain
7845 non-ASCII characters are encoded by the coding system `iso-2022-7bit'
7846 in Emacs's distribution, and they won't be decoded correctly on
7847 reading if you suppress escape sequence detection.
7849 The other way to read escape sequences in a file without decoding is
7850 to explicitly specify some coding system that doesn't use ISO2022's
7851 escape sequence (e.g `latin-1') on reading by \\[universal-coding-system-argument]. */);
7852 inhibit_iso_escape_detection
= 0;
7854 DEFVAR_LISP ("translation-table-for-input", &Vtranslation_table_for_input
,
7855 doc
: /* Char table for translating self-inserting characters.
7856 This is applied to the result of input methods, not their input. See also
7857 `keyboard-translate-table'. */);
7858 Vtranslation_table_for_input
= Qnil
;
7862 emacs_strerror (error_number
)
7867 synchronize_system_messages_locale ();
7868 str
= strerror (error_number
);
7870 if (! NILP (Vlocale_coding_system
))
7872 Lisp_Object dec
= code_convert_string_norecord (build_string (str
),
7873 Vlocale_coding_system
,
7875 str
= (char *) SDATA (dec
);