stb_image: update to 64fa9a3d of stb.git
[fbvis.git] / stb_image.c
blob0a9de39bf877cef2745ef2b659cd267d6e0fd66a
1 /* stb_image - v2.08 - public domain image loader - http://nothings.org/stb_image.h
2 no warranty implied; use at your own risk
4 Do this:
5 #define STB_IMAGE_IMPLEMENTATION
6 before you include this file in *one* C or C++ file to create the implementation.
8 // i.e. it should look like this:
9 #include ...
10 #include ...
11 #include ...
12 #define STB_IMAGE_IMPLEMENTATION
13 #include "stb_image.h"
15 You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16 And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
19 QUICK NOTES:
20 Primarily of interest to game developers and other people who can
21 avoid problematic images and only need the trivial interface
23 JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24 PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
26 TGA (not sure what subset, if a subset)
27 BMP non-1bpp, non-RLE
28 PSD (composited view only, no extra channels, 8/16 bit-per-channel)
30 GIF (*comp always reports as 4-channel)
31 HDR (radiance rgbE format)
32 PIC (Softimage PIC)
33 PNM (PPM and PGM binary only)
35 Animated GIF still needs a proper API, but here's one way to do it:
36 http://gist.github.com/urraka/685d9a6340b26b830d49
38 - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39 - decode from arbitrary I/O callbacks
40 - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
42 Full documentation under "DOCUMENTATION" below.
45 Revision 2.00 release notes:
47 - Progressive JPEG is now supported.
49 - PPM and PGM binary formats are now supported, thanks to Ken Miller.
51 - x86 platforms now make use of SSE2 SIMD instructions for
52 JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53 This work was done by Fabian "ryg" Giesen. SSE2 is used by
54 default, but NEON must be enabled explicitly; see docs.
56 With other JPEG optimizations included in this version, we see
57 2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58 on a JPEG on an ARM machine, relative to previous versions of this
59 library. The same results will not obtain for all JPGs and for all
60 x86/ARM machines. (Note that progressive JPEGs are significantly
61 slower to decode than regular JPEGs.) This doesn't mean that this
62 is the fastest JPEG decoder in the land; rather, it brings it
63 closer to parity with standard libraries. If you want the fastest
64 decode, look elsewhere. (See "Philosophy" section of docs below.)
66 See final bullet items below for more info on SIMD.
68 - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69 the memory allocator. Unlike other STBI libraries, these macros don't
70 support a context parameter, so if you need to pass a context in to
71 the allocator, you'll have to store it in a global or a thread-local
72 variable.
74 - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75 STBI_NO_LINEAR.
76 STBI_NO_HDR: suppress implementation of .hdr reader format
77 STBI_NO_LINEAR: suppress high-dynamic-range light-linear float API
79 - You can suppress implementation of any of the decoders to reduce
80 your code footprint by #defining one or more of the following
81 symbols before creating the implementation.
83 STBI_NO_JPEG
84 STBI_NO_PNG
85 STBI_NO_BMP
86 STBI_NO_PSD
87 STBI_NO_TGA
88 STBI_NO_GIF
89 STBI_NO_HDR
90 STBI_NO_PIC
91 STBI_NO_PNM (.ppm and .pgm)
93 - You can request *only* certain decoders and suppress all other ones
94 (this will be more forward-compatible, as addition of new decoders
95 doesn't require you to disable them explicitly):
97 STBI_ONLY_JPEG
98 STBI_ONLY_PNG
99 STBI_ONLY_BMP
100 STBI_ONLY_PSD
101 STBI_ONLY_TGA
102 STBI_ONLY_GIF
103 STBI_ONLY_HDR
104 STBI_ONLY_PIC
105 STBI_ONLY_PNM (.ppm and .pgm)
107 Note that you can define multiples of these, and you will get all
108 of them ("only x" and "only y" is interpreted to mean "only x&y").
110 - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111 want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
113 - Compilation of all SIMD code can be suppressed with
114 #define STBI_NO_SIMD
115 It should not be necessary to disable SIMD unless you have issues
116 compiling (e.g. using an x86 compiler which doesn't support SSE
117 intrinsics or that doesn't support the method used to detect
118 SSE2 support at run-time), and even those can be reported as
119 bugs so I can refine the built-in compile-time checking to be
120 smarter.
122 - The old STBI_SIMD system which allowed installing a user-defined
123 IDCT etc. has been removed. If you need this, don't upgrade. My
124 assumption is that almost nobody was doing this, and those who
125 were will find the built-in SIMD more satisfactory anyway.
127 - RGB values computed for JPEG images are slightly different from
128 previous versions of stb_image. (This is due to using less
129 integer precision in SIMD.) The C code has been adjusted so
130 that the same RGB values will be computed regardless of whether
131 SIMD support is available, so your app should always produce
132 consistent results. But these results are slightly different from
133 previous versions. (Specifically, about 3% of available YCbCr values
134 will compute different RGB results from pre-1.49 versions by +-1;
135 most of the deviating values are one smaller in the G channel.)
137 - If you must produce consistent results with previous versions of
138 stb_image, #define STBI_JPEG_OLD and you will get the same results
139 you used to; however, you will not get the SIMD speedups for
140 the YCbCr-to-RGB conversion step (although you should still see
141 significant JPEG speedup from the other changes).
143 Please note that STBI_JPEG_OLD is a temporary feature; it will be
144 removed in future versions of the library. It is only intended for
145 near-term back-compatibility use.
148 Latest revision history:
149 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
150 2.07 (2015-09-13) partial animated GIF support
151 limited 16-bit PSD support
152 minor bugs, code cleanup, and compiler warnings
153 2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value
154 2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning
155 2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit
156 2.03 (2015-04-12) additional corruption checking
157 stbi_set_flip_vertically_on_load
158 fix NEON support; fix mingw support
159 2.02 (2015-01-19) fix incorrect assert, fix warning
160 2.01 (2015-01-17) fix various warnings
161 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
162 2.00 (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
163 progressive JPEG
164 PGM/PPM support
165 STBI_MALLOC,STBI_REALLOC,STBI_FREE
166 STBI_NO_*, STBI_ONLY_*
167 GIF bugfix
168 1.48 (2014-12-14) fix incorrectly-named assert()
169 1.47 (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
170 optimize PNG
171 fix bug in interlaced PNG with user-specified channel count
173 See end of file for full revision history.
176 ============================ Contributors =========================
178 Image formats Bug fixes & warning fixes
179 Sean Barrett (jpeg, png, bmp) Marc LeBlanc
180 Nicolas Schulz (hdr, psd) Christpher Lloyd
181 Jonathan Dummer (tga) Dave Moore
182 Jean-Marc Lienher (gif) Won Chun
183 Tom Seddon (pic) the Horde3D community
184 Thatcher Ulrich (psd) Janez Zemva
185 Ken Miller (pgm, ppm) Jonathan Blow
186 urraka@github (animated gif) Laurent Gomila
187 Aruelien Pocheville
188 Ryamond Barbiero
189 David Woo
190 Extensions, features Martin Golini
191 Jetro Lauha (stbi_info) Roy Eltham
192 Martin "SpartanJ" Golini (stbi_info) Luke Graham
193 James "moose2000" Brown (iPhone PNG) Thomas Ruf
194 Ben "Disch" Wenger (io callbacks) John Bartholomew
195 Omar Cornut (1/2/4-bit PNG) Ken Hamada
196 Nicolas Guillemot (vertical flip) Cort Stratton
197 Richard Mitton (16-bit PSD) Blazej Dariusz Roszkowski
198 Thibault Reuille
199 Paul Du Bois
200 Guillaume George
201 Jerry Jansson
202 Hayaki Saito
203 Johan Duparc
204 Ronny Chevalier
205 Optimizations & bugfixes Michal Cichon
206 Fabian "ryg" Giesen Tero Hanninen
207 Arseny Kapoulkine Sergio Gonzalez
208 Cass Everitt
209 Engin Manap
210 If your name should be here but Martins Mozeiko
211 isn't, let Sean know. Joseph Thomson
212 Phil Jordan
213 Nathan Reed
214 Michaelangel007@github
215 Nick Verigakis
217 LICENSE
219 This software is in the public domain. Where that dedication is not
220 recognized, you are granted a perpetual, irrevocable license to copy,
221 distribute, and modify this file as you see fit.
225 #ifndef STBI_INCLUDE_STB_IMAGE_H
226 #define STBI_INCLUDE_STB_IMAGE_H
228 // DOCUMENTATION
230 // Limitations:
231 // - no 16-bit-per-channel PNG
232 // - no 12-bit-per-channel JPEG
233 // - no JPEGs with arithmetic coding
234 // - no 1-bit BMP
235 // - GIF always returns *comp=4
237 // Basic usage (see HDR discussion below for HDR usage):
238 // int x,y,n;
239 // unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
240 // // ... process data if not NULL ...
241 // // ... x = width, y = height, n = # 8-bit components per pixel ...
242 // // ... replace '0' with '1'..'4' to force that many components per pixel
243 // // ... but 'n' will always be the number that it would have been if you said 0
244 // stbi_image_free(data)
246 // Standard parameters:
247 // int *x -- outputs image width in pixels
248 // int *y -- outputs image height in pixels
249 // int *comp -- outputs # of image components in image file
250 // int req_comp -- if non-zero, # of image components requested in result
252 // The return value from an image loader is an 'unsigned char *' which points
253 // to the pixel data, or NULL on an allocation failure or if the image is
254 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
255 // with each pixel consisting of N interleaved 8-bit components; the first
256 // pixel pointed to is top-left-most in the image. There is no padding between
257 // image scanlines or between pixels, regardless of format. The number of
258 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
259 // If req_comp is non-zero, *comp has the number of components that _would_
260 // have been output otherwise. E.g. if you set req_comp to 4, you will always
261 // get RGBA output, but you can check *comp to see if it's trivially opaque
262 // because e.g. there were only 3 channels in the source image.
264 // An output image with N components has the following components interleaved
265 // in this order in each pixel:
267 // N=#comp components
268 // 1 grey
269 // 2 grey, alpha
270 // 3 red, green, blue
271 // 4 red, green, blue, alpha
273 // If image loading fails for any reason, the return value will be NULL,
274 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
275 // can be queried for an extremely brief, end-user unfriendly explanation
276 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
277 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
278 // more user-friendly ones.
280 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
282 // ===========================================================================
284 // Philosophy
286 // stb libraries are designed with the following priorities:
288 // 1. easy to use
289 // 2. easy to maintain
290 // 3. good performance
292 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
293 // and for best performance I may provide less-easy-to-use APIs that give higher
294 // performance, in addition to the easy to use ones. Nevertheless, it's important
295 // to keep in mind that from the standpoint of you, a client of this library,
296 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
298 // Some secondary priorities arise directly from the first two, some of which
299 // make more explicit reasons why performance can't be emphasized.
301 // - Portable ("ease of use")
302 // - Small footprint ("easy to maintain")
303 // - No dependencies ("ease of use")
305 // ===========================================================================
307 // I/O callbacks
309 // I/O callbacks allow you to read from arbitrary sources, like packaged
310 // files or some other source. Data read from callbacks are processed
311 // through a small internal buffer (currently 128 bytes) to try to reduce
312 // overhead.
314 // The three functions you must define are "read" (reads some bytes of data),
315 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
317 // ===========================================================================
319 // SIMD support
321 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
322 // supported by the compiler. For ARM Neon support, you must explicitly
323 // request it.
325 // (The old do-it-yourself SIMD API is no longer supported in the current
326 // code.)
328 // On x86, SSE2 will automatically be used when available based on a run-time
329 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
330 // the typical path is to have separate builds for NEON and non-NEON devices
331 // (at least this is true for iOS and Android). Therefore, the NEON support is
332 // toggled by a build flag: define STBI_NEON to get NEON loops.
334 // The output of the JPEG decoder is slightly different from versions where
335 // SIMD support was introduced (that is, for versions before 1.49). The
336 // difference is only +-1 in the 8-bit RGB channels, and only on a small
337 // fraction of pixels. You can force the pre-1.49 behavior by defining
338 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
339 // and hence cost some performance.
341 // If for some reason you do not want to use any of SIMD code, or if
342 // you have issues compiling it, you can disable it entirely by
343 // defining STBI_NO_SIMD.
345 // ===========================================================================
347 // HDR image support (disable by defining STBI_NO_HDR)
349 // stb_image now supports loading HDR images in general, and currently
350 // the Radiance .HDR file format, although the support is provided
351 // generically. You can still load any file through the existing interface;
352 // if you attempt to load an HDR file, it will be automatically remapped to
353 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
354 // both of these constants can be reconfigured through this interface:
356 // stbi_hdr_to_ldr_gamma(2.2f);
357 // stbi_hdr_to_ldr_scale(1.0f);
359 // (note, do not use _inverse_ constants; stbi_image will invert them
360 // appropriately).
362 // Additionally, there is a new, parallel interface for loading files as
363 // (linear) floats to preserve the full dynamic range:
365 // float *data = stbi_loadf(filename, &x, &y, &n, 0);
367 // If you load LDR images through this interface, those images will
368 // be promoted to floating point values, run through the inverse of
369 // constants corresponding to the above:
371 // stbi_ldr_to_hdr_scale(1.0f);
372 // stbi_ldr_to_hdr_gamma(2.2f);
374 // Finally, given a filename (or an open file or memory block--see header
375 // file for details) containing image data, you can query for the "most
376 // appropriate" interface to use (that is, whether the image is HDR or
377 // not), using:
379 // stbi_is_hdr(char *filename);
381 // ===========================================================================
383 // iPhone PNG support:
385 // By default we convert iphone-formatted PNGs back to RGB, even though
386 // they are internally encoded differently. You can disable this conversion
387 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
388 // you will always just get the native iphone "format" through (which
389 // is BGR stored in RGB).
391 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
392 // pixel to remove any premultiplied alpha *only* if the image file explicitly
393 // says there's premultiplied data (currently only happens in iPhone images,
394 // and only if iPhone convert-to-rgb processing is on).
398 #ifndef STBI_NO_STDIO
399 #include <stdio.h>
400 #endif // STBI_NO_STDIO
402 #define STBI_VERSION 1
404 enum
406 STBI_default = 0, // only used for req_comp
408 STBI_grey = 1,
409 STBI_grey_alpha = 2,
410 STBI_rgb = 3,
411 STBI_rgb_alpha = 4
414 typedef unsigned char stbi_uc;
416 #ifdef __cplusplus
417 extern "C" {
418 #endif
420 #ifdef STB_IMAGE_STATIC
421 #define STBIDEF static
422 #else
423 #define STBIDEF extern
424 #endif
426 //////////////////////////////////////////////////////////////////////////////
428 // PRIMARY API - works on images of any type
432 // load image by filename, open file, or memory buffer
435 typedef struct
437 int (*read) (void *user,char *data,int size); // fill 'data' with 'size' bytes. return number of bytes actually read
438 void (*skip) (void *user,int n); // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
439 int (*eof) (void *user); // returns nonzero if we are at end of file/data
440 } stbi_io_callbacks;
442 STBIDEF stbi_uc *stbi_load (char const *filename, int *x, int *y, int *comp, int req_comp);
443 STBIDEF stbi_uc *stbi_load_from_memory (stbi_uc const *buffer, int len , int *x, int *y, int *comp, int req_comp);
444 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk , void *user, int *x, int *y, int *comp, int req_comp);
446 #ifndef STBI_NO_STDIO
447 STBIDEF stbi_uc *stbi_load_from_file (FILE *f, int *x, int *y, int *comp, int req_comp);
448 // for stbi_load_from_file, file pointer is left pointing immediately after image
449 #endif
451 #ifndef STBI_NO_LINEAR
452 STBIDEF float *stbi_loadf (char const *filename, int *x, int *y, int *comp, int req_comp);
453 STBIDEF float *stbi_loadf_from_memory (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
454 STBIDEF float *stbi_loadf_from_callbacks (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
456 #ifndef STBI_NO_STDIO
457 STBIDEF float *stbi_loadf_from_file (FILE *f, int *x, int *y, int *comp, int req_comp);
458 #endif
459 #endif
461 #ifndef STBI_NO_HDR
462 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma);
463 STBIDEF void stbi_hdr_to_ldr_scale(float scale);
464 #endif
466 #ifndef STBI_NO_LINEAR
467 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma);
468 STBIDEF void stbi_ldr_to_hdr_scale(float scale);
469 #endif // STBI_NO_HDR
471 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
472 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
473 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
474 #ifndef STBI_NO_STDIO
475 STBIDEF int stbi_is_hdr (char const *filename);
476 STBIDEF int stbi_is_hdr_from_file(FILE *f);
477 #endif // STBI_NO_STDIO
480 // get a VERY brief reason for failure
481 // NOT THREADSAFE
482 STBIDEF const char *stbi_failure_reason (void);
484 // free the loaded image -- this is just free()
485 STBIDEF void stbi_image_free (void *retval_from_stbi_load);
487 // get image dimensions & components without fully decoding
488 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
489 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
491 #ifndef STBI_NO_STDIO
492 STBIDEF int stbi_info (char const *filename, int *x, int *y, int *comp);
493 STBIDEF int stbi_info_from_file (FILE *f, int *x, int *y, int *comp);
495 #endif
499 // for image formats that explicitly notate that they have premultiplied alpha,
500 // we just return the colors as stored in the file. set this flag to force
501 // unpremultiplication. results are undefined if the unpremultiply overflow.
502 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
504 // indicate whether we should process iphone images back to canonical format,
505 // or just pass them through "as-is"
506 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
508 // flip the image vertically, so the first pixel in the output array is the bottom left
509 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
511 // ZLIB client - used by PNG, available for other purposes
513 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
514 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
515 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
516 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
518 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
519 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
522 #ifdef __cplusplus
524 #endif
528 //// end header file /////////////////////////////////////////////////////
529 #endif // STBI_INCLUDE_STB_IMAGE_H
531 #ifdef STB_IMAGE_IMPLEMENTATION
533 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
534 || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
535 || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
536 || defined(STBI_ONLY_ZLIB)
537 #ifndef STBI_ONLY_JPEG
538 #define STBI_NO_JPEG
539 #endif
540 #ifndef STBI_ONLY_PNG
541 #define STBI_NO_PNG
542 #endif
543 #ifndef STBI_ONLY_BMP
544 #define STBI_NO_BMP
545 #endif
546 #ifndef STBI_ONLY_PSD
547 #define STBI_NO_PSD
548 #endif
549 #ifndef STBI_ONLY_TGA
550 #define STBI_NO_TGA
551 #endif
552 #ifndef STBI_ONLY_GIF
553 #define STBI_NO_GIF
554 #endif
555 #ifndef STBI_ONLY_HDR
556 #define STBI_NO_HDR
557 #endif
558 #ifndef STBI_ONLY_PIC
559 #define STBI_NO_PIC
560 #endif
561 #ifndef STBI_ONLY_PNM
562 #define STBI_NO_PNM
563 #endif
564 #endif
566 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
567 #define STBI_NO_ZLIB
568 #endif
571 #include <stdarg.h>
572 #include <stddef.h> // ptrdiff_t on osx
573 #include <stdlib.h>
574 #include <string.h>
576 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
577 #include <math.h> // ldexp
578 #endif
580 #ifndef STBI_NO_STDIO
581 #include <stdio.h>
582 #endif
584 #ifndef STBI_ASSERT
585 #include <assert.h>
586 #define STBI_ASSERT(x) assert(x)
587 #endif
590 #ifndef _MSC_VER
591 #ifdef __cplusplus
592 #define stbi_inline inline
593 #else
594 #define stbi_inline
595 #endif
596 #else
597 #define stbi_inline __forceinline
598 #endif
601 #ifdef _MSC_VER
602 typedef unsigned short stbi__uint16;
603 typedef signed short stbi__int16;
604 typedef unsigned int stbi__uint32;
605 typedef signed int stbi__int32;
606 #else
607 #include <stdint.h>
608 typedef uint16_t stbi__uint16;
609 typedef int16_t stbi__int16;
610 typedef uint32_t stbi__uint32;
611 typedef int32_t stbi__int32;
612 #endif
614 // should produce compiler error if size is wrong
615 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
617 #ifdef _MSC_VER
618 #define STBI_NOTUSED(v) (void)(v)
619 #else
620 #define STBI_NOTUSED(v) (void)sizeof(v)
621 #endif
623 #ifdef _MSC_VER
624 #define STBI_HAS_LROTL
625 #endif
627 #ifdef STBI_HAS_LROTL
628 #define stbi_lrot(x,y) _lrotl(x,y)
629 #else
630 #define stbi_lrot(x,y) (((x) << (y)) | ((x) >> (32 - (y))))
631 #endif
633 #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
634 // ok
635 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
636 // ok
637 #else
638 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
639 #endif
641 #ifndef STBI_MALLOC
642 #define STBI_MALLOC(sz) malloc(sz)
643 #define STBI_REALLOC(p,sz) realloc(p,sz)
644 #define STBI_FREE(p) free(p)
645 #endif
647 // x86/x64 detection
648 #if defined(__x86_64__) || defined(_M_X64)
649 #define STBI__X64_TARGET
650 #elif defined(__i386) || defined(_M_IX86)
651 #define STBI__X86_TARGET
652 #endif
654 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
655 // NOTE: not clear do we actually need this for the 64-bit path?
656 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
657 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
658 // this is just broken and gcc are jerks for not fixing it properly
659 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
660 #define STBI_NO_SIMD
661 #endif
663 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
664 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
666 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
667 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
668 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
669 // simultaneously enabling "-mstackrealign".
671 // See https://github.com/nothings/stb/issues/81 for more information.
673 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
674 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
675 #define STBI_NO_SIMD
676 #endif
678 #if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET)
679 #define STBI_SSE2
680 #include <emmintrin.h>
682 #ifdef _MSC_VER
684 #if _MSC_VER >= 1400 // not VC6
685 #include <intrin.h> // __cpuid
686 static int stbi__cpuid3(void)
688 int info[4];
689 __cpuid(info,1);
690 return info[3];
692 #else
693 static int stbi__cpuid3(void)
695 int res;
696 __asm {
697 mov eax,1
698 cpuid
699 mov res,edx
701 return res;
703 #endif
705 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
707 static int stbi__sse2_available()
709 int info3 = stbi__cpuid3();
710 return ((info3 >> 26) & 1) != 0;
712 #else // assume GCC-style if not VC++
713 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
715 static int stbi__sse2_available()
717 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
718 // GCC 4.8+ has a nice way to do this
719 return __builtin_cpu_supports("sse2");
720 #else
721 // portable way to do this, preferably without using GCC inline ASM?
722 // just bail for now.
723 return 0;
724 #endif
726 #endif
727 #endif
729 // ARM NEON
730 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
731 #undef STBI_NEON
732 #endif
734 #ifdef STBI_NEON
735 #include <arm_neon.h>
736 // assume GCC or Clang on ARM targets
737 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
738 #endif
740 #ifndef STBI_SIMD_ALIGN
741 #define STBI_SIMD_ALIGN(type, name) type name
742 #endif
744 ///////////////////////////////////////////////
746 // stbi__context struct and start_xxx functions
748 // stbi__context structure is our basic context used by all images, so it
749 // contains all the IO context, plus some basic image information
750 typedef struct
752 stbi__uint32 img_x, img_y;
753 int img_n, img_out_n;
755 stbi_io_callbacks io;
756 void *io_user_data;
758 int read_from_callbacks;
759 int buflen;
760 stbi_uc buffer_start[128];
762 stbi_uc *img_buffer, *img_buffer_end;
763 stbi_uc *img_buffer_original, *img_buffer_original_end;
764 } stbi__context;
767 static void stbi__refill_buffer(stbi__context *s);
769 // initialize a memory-decode context
770 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
772 s->io.read = NULL;
773 s->read_from_callbacks = 0;
774 s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
775 s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
778 // initialize a callback-based context
779 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
781 s->io = *c;
782 s->io_user_data = user;
783 s->buflen = sizeof(s->buffer_start);
784 s->read_from_callbacks = 1;
785 s->img_buffer_original = s->buffer_start;
786 stbi__refill_buffer(s);
787 s->img_buffer_original_end = s->img_buffer_end;
790 #ifndef STBI_NO_STDIO
792 static int stbi__stdio_read(void *user, char *data, int size)
794 return (int) fread(data,1,size,(FILE*) user);
797 static void stbi__stdio_skip(void *user, int n)
799 fseek((FILE*) user, n, SEEK_CUR);
802 static int stbi__stdio_eof(void *user)
804 return feof((FILE*) user);
807 static stbi_io_callbacks stbi__stdio_callbacks =
809 stbi__stdio_read,
810 stbi__stdio_skip,
811 stbi__stdio_eof,
814 static void stbi__start_file(stbi__context *s, FILE *f)
816 stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
819 //static void stop_file(stbi__context *s) { }
821 #endif // !STBI_NO_STDIO
823 static void stbi__rewind(stbi__context *s)
825 // conceptually rewind SHOULD rewind to the beginning of the stream,
826 // but we just rewind to the beginning of the initial buffer, because
827 // we only use it after doing 'test', which only ever looks at at most 92 bytes
828 s->img_buffer = s->img_buffer_original;
829 s->img_buffer_end = s->img_buffer_original_end;
832 #ifndef STBI_NO_JPEG
833 static int stbi__jpeg_test(stbi__context *s);
834 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
835 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
836 #endif
838 #ifndef STBI_NO_PNG
839 static int stbi__png_test(stbi__context *s);
840 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
841 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
842 #endif
844 #ifndef STBI_NO_BMP
845 static int stbi__bmp_test(stbi__context *s);
846 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
847 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
848 #endif
850 #ifndef STBI_NO_TGA
851 static int stbi__tga_test(stbi__context *s);
852 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
853 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
854 #endif
856 #ifndef STBI_NO_PSD
857 static int stbi__psd_test(stbi__context *s);
858 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
859 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
860 #endif
862 #ifndef STBI_NO_HDR
863 static int stbi__hdr_test(stbi__context *s);
864 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
865 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
866 #endif
868 #ifndef STBI_NO_PIC
869 static int stbi__pic_test(stbi__context *s);
870 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
871 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
872 #endif
874 #ifndef STBI_NO_GIF
875 static int stbi__gif_test(stbi__context *s);
876 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
877 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
878 #endif
880 #ifndef STBI_NO_PNM
881 static int stbi__pnm_test(stbi__context *s);
882 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
883 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
884 #endif
886 // this is not threadsafe
887 static const char *stbi__g_failure_reason;
889 STBIDEF const char *stbi_failure_reason(void)
891 return stbi__g_failure_reason;
894 static int stbi__err(const char *str)
896 stbi__g_failure_reason = str;
897 return 0;
900 static void *stbi__malloc(size_t size)
902 return STBI_MALLOC(size);
905 // stbi__err - error
906 // stbi__errpf - error returning pointer to float
907 // stbi__errpuc - error returning pointer to unsigned char
909 #ifdef STBI_NO_FAILURE_STRINGS
910 #define stbi__err(x,y) 0
911 #elif defined(STBI_FAILURE_USERMSG)
912 #define stbi__err(x,y) stbi__err(y)
913 #else
914 #define stbi__err(x,y) stbi__err(x)
915 #endif
917 #define stbi__errpf(x,y) ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
918 #define stbi__errpuc(x,y) ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
920 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
922 STBI_FREE(retval_from_stbi_load);
925 #ifndef STBI_NO_LINEAR
926 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
927 #endif
929 #ifndef STBI_NO_HDR
930 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp);
931 #endif
933 static int stbi__vertically_flip_on_load = 0;
935 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
937 stbi__vertically_flip_on_load = flag_true_if_should_flip;
940 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
942 #ifndef STBI_NO_JPEG
943 if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
944 #endif
945 #ifndef STBI_NO_PNG
946 if (stbi__png_test(s)) return stbi__png_load(s,x,y,comp,req_comp);
947 #endif
948 #ifndef STBI_NO_BMP
949 if (stbi__bmp_test(s)) return stbi__bmp_load(s,x,y,comp,req_comp);
950 #endif
951 #ifndef STBI_NO_GIF
952 if (stbi__gif_test(s)) return stbi__gif_load(s,x,y,comp,req_comp);
953 #endif
954 #ifndef STBI_NO_PSD
955 if (stbi__psd_test(s)) return stbi__psd_load(s,x,y,comp,req_comp);
956 #endif
957 #ifndef STBI_NO_PIC
958 if (stbi__pic_test(s)) return stbi__pic_load(s,x,y,comp,req_comp);
959 #endif
960 #ifndef STBI_NO_PNM
961 if (stbi__pnm_test(s)) return stbi__pnm_load(s,x,y,comp,req_comp);
962 #endif
964 #ifndef STBI_NO_HDR
965 if (stbi__hdr_test(s)) {
966 float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
967 return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
969 #endif
971 #ifndef STBI_NO_TGA
972 // test tga last because it's a crappy test!
973 if (stbi__tga_test(s))
974 return stbi__tga_load(s,x,y,comp,req_comp);
975 #endif
977 return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
980 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
982 unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
984 if (stbi__vertically_flip_on_load && result != NULL) {
985 int w = *x, h = *y;
986 int depth = req_comp ? req_comp : *comp;
987 int row,col,z;
988 stbi_uc temp;
990 // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
991 for (row = 0; row < (h>>1); row++) {
992 for (col = 0; col < w; col++) {
993 for (z = 0; z < depth; z++) {
994 temp = result[(row * w + col) * depth + z];
995 result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
996 result[((h - row - 1) * w + col) * depth + z] = temp;
1002 return result;
1005 #ifndef STBI_NO_HDR
1006 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1008 if (stbi__vertically_flip_on_load && result != NULL) {
1009 int w = *x, h = *y;
1010 int depth = req_comp ? req_comp : *comp;
1011 int row,col,z;
1012 float temp;
1014 // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1015 for (row = 0; row < (h>>1); row++) {
1016 for (col = 0; col < w; col++) {
1017 for (z = 0; z < depth; z++) {
1018 temp = result[(row * w + col) * depth + z];
1019 result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1020 result[((h - row - 1) * w + col) * depth + z] = temp;
1026 #endif
1028 #ifndef STBI_NO_STDIO
1030 static FILE *stbi__fopen(char const *filename, char const *mode)
1032 FILE *f;
1033 #if defined(_MSC_VER) && _MSC_VER >= 1400
1034 if (0 != fopen_s(&f, filename, mode))
1035 f=0;
1036 #else
1037 f = fopen(filename, mode);
1038 #endif
1039 return f;
1043 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1045 FILE *f = stbi__fopen(filename, "rb");
1046 unsigned char *result;
1047 if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1048 result = stbi_load_from_file(f,x,y,comp,req_comp);
1049 fclose(f);
1050 return result;
1053 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1055 unsigned char *result;
1056 stbi__context s;
1057 stbi__start_file(&s,f);
1058 result = stbi__load_flip(&s,x,y,comp,req_comp);
1059 if (result) {
1060 // need to 'unget' all the characters in the IO buffer
1061 fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1063 return result;
1065 #endif //!STBI_NO_STDIO
1067 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1069 stbi__context s;
1070 stbi__start_mem(&s,buffer,len);
1071 return stbi__load_flip(&s,x,y,comp,req_comp);
1074 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1076 stbi__context s;
1077 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1078 return stbi__load_flip(&s,x,y,comp,req_comp);
1081 #ifndef STBI_NO_LINEAR
1082 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1084 unsigned char *data;
1085 #ifndef STBI_NO_HDR
1086 if (stbi__hdr_test(s)) {
1087 float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
1088 if (hdr_data)
1089 stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1090 return hdr_data;
1092 #endif
1093 data = stbi__load_flip(s, x, y, comp, req_comp);
1094 if (data)
1095 return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1096 return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1099 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1101 stbi__context s;
1102 stbi__start_mem(&s,buffer,len);
1103 return stbi__loadf_main(&s,x,y,comp,req_comp);
1106 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1108 stbi__context s;
1109 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1110 return stbi__loadf_main(&s,x,y,comp,req_comp);
1113 #ifndef STBI_NO_STDIO
1114 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1116 float *result;
1117 FILE *f = stbi__fopen(filename, "rb");
1118 if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1119 result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1120 fclose(f);
1121 return result;
1124 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1126 stbi__context s;
1127 stbi__start_file(&s,f);
1128 return stbi__loadf_main(&s,x,y,comp,req_comp);
1130 #endif // !STBI_NO_STDIO
1132 #endif // !STBI_NO_LINEAR
1134 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1135 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1136 // reports false!
1138 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1140 #ifndef STBI_NO_HDR
1141 stbi__context s;
1142 stbi__start_mem(&s,buffer,len);
1143 return stbi__hdr_test(&s);
1144 #else
1145 STBI_NOTUSED(buffer);
1146 STBI_NOTUSED(len);
1147 return 0;
1148 #endif
1151 #ifndef STBI_NO_STDIO
1152 STBIDEF int stbi_is_hdr (char const *filename)
1154 FILE *f = stbi__fopen(filename, "rb");
1155 int result=0;
1156 if (f) {
1157 result = stbi_is_hdr_from_file(f);
1158 fclose(f);
1160 return result;
1163 STBIDEF int stbi_is_hdr_from_file(FILE *f)
1165 #ifndef STBI_NO_HDR
1166 stbi__context s;
1167 stbi__start_file(&s,f);
1168 return stbi__hdr_test(&s);
1169 #else
1170 STBI_NOTUSED(f);
1171 return 0;
1172 #endif
1174 #endif // !STBI_NO_STDIO
1176 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1178 #ifndef STBI_NO_HDR
1179 stbi__context s;
1180 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1181 return stbi__hdr_test(&s);
1182 #else
1183 STBI_NOTUSED(clbk);
1184 STBI_NOTUSED(user);
1185 return 0;
1186 #endif
1189 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1190 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1192 #ifndef STBI_NO_LINEAR
1193 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1194 STBIDEF void stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1195 #endif
1197 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1198 STBIDEF void stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1201 //////////////////////////////////////////////////////////////////////////////
1203 // Common code used by all image loaders
1206 enum
1208 STBI__SCAN_load=0,
1209 STBI__SCAN_type,
1210 STBI__SCAN_header
1213 static void stbi__refill_buffer(stbi__context *s)
1215 int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1216 if (n == 0) {
1217 // at end of file, treat same as if from memory, but need to handle case
1218 // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1219 s->read_from_callbacks = 0;
1220 s->img_buffer = s->buffer_start;
1221 s->img_buffer_end = s->buffer_start+1;
1222 *s->img_buffer = 0;
1223 } else {
1224 s->img_buffer = s->buffer_start;
1225 s->img_buffer_end = s->buffer_start + n;
1229 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1231 if (s->img_buffer < s->img_buffer_end)
1232 return *s->img_buffer++;
1233 if (s->read_from_callbacks) {
1234 stbi__refill_buffer(s);
1235 return *s->img_buffer++;
1237 return 0;
1240 stbi_inline static int stbi__at_eof(stbi__context *s)
1242 if (s->io.read) {
1243 if (!(s->io.eof)(s->io_user_data)) return 0;
1244 // if feof() is true, check if buffer = end
1245 // special case: we've only got the special 0 character at the end
1246 if (s->read_from_callbacks == 0) return 1;
1249 return s->img_buffer >= s->img_buffer_end;
1252 static void stbi__skip(stbi__context *s, int n)
1254 if (n < 0) {
1255 s->img_buffer = s->img_buffer_end;
1256 return;
1258 if (s->io.read) {
1259 int blen = (int) (s->img_buffer_end - s->img_buffer);
1260 if (blen < n) {
1261 s->img_buffer = s->img_buffer_end;
1262 (s->io.skip)(s->io_user_data, n - blen);
1263 return;
1266 s->img_buffer += n;
1269 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1271 if (s->io.read) {
1272 int blen = (int) (s->img_buffer_end - s->img_buffer);
1273 if (blen < n) {
1274 int res, count;
1276 memcpy(buffer, s->img_buffer, blen);
1278 count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1279 res = (count == (n-blen));
1280 s->img_buffer = s->img_buffer_end;
1281 return res;
1285 if (s->img_buffer+n <= s->img_buffer_end) {
1286 memcpy(buffer, s->img_buffer, n);
1287 s->img_buffer += n;
1288 return 1;
1289 } else
1290 return 0;
1293 static int stbi__get16be(stbi__context *s)
1295 int z = stbi__get8(s);
1296 return (z << 8) + stbi__get8(s);
1299 static stbi__uint32 stbi__get32be(stbi__context *s)
1301 stbi__uint32 z = stbi__get16be(s);
1302 return (z << 16) + stbi__get16be(s);
1305 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1306 // nothing
1307 #else
1308 static int stbi__get16le(stbi__context *s)
1310 int z = stbi__get8(s);
1311 return z + (stbi__get8(s) << 8);
1313 #endif
1315 #ifndef STBI_NO_BMP
1316 static stbi__uint32 stbi__get32le(stbi__context *s)
1318 stbi__uint32 z = stbi__get16le(s);
1319 return z + (stbi__get16le(s) << 16);
1321 #endif
1323 #define STBI__BYTECAST(x) ((stbi_uc) ((x) & 255)) // truncate int to byte without warnings
1326 //////////////////////////////////////////////////////////////////////////////
1328 // generic converter from built-in img_n to req_comp
1329 // individual types do this automatically as much as possible (e.g. jpeg
1330 // does all cases internally since it needs to colorspace convert anyway,
1331 // and it never has alpha, so very few cases ). png can automatically
1332 // interleave an alpha=255 channel, but falls back to this for other cases
1334 // assume data buffer is malloced, so malloc a new one and free that one
1335 // only failure mode is malloc failing
1337 static stbi_uc stbi__compute_y(int r, int g, int b)
1339 return (stbi_uc) (((r*77) + (g*150) + (29*b)) >> 8);
1342 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1344 int i,j;
1345 unsigned char *good;
1347 if (req_comp == img_n) return data;
1348 STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1350 good = (unsigned char *) stbi__malloc(req_comp * x * y);
1351 if (good == NULL) {
1352 STBI_FREE(data);
1353 return stbi__errpuc("outofmem", "Out of memory");
1356 for (j=0; j < (int) y; ++j) {
1357 unsigned char *src = data + j * x * img_n ;
1358 unsigned char *dest = good + j * x * req_comp;
1360 #define COMBO(a,b) ((a)*8+(b))
1361 #define CASE(a,b) case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1362 // convert source image with img_n components to one with req_comp components;
1363 // avoid switch per pixel, so use switch per scanline and massive macros
1364 switch (COMBO(img_n, req_comp)) {
1365 CASE(1,2) dest[0]=src[0], dest[1]=255; break;
1366 CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1367 CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
1368 CASE(2,1) dest[0]=src[0]; break;
1369 CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1370 CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
1371 CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
1372 CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1373 CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
1374 CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1375 CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
1376 CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
1377 default: STBI_ASSERT(0);
1379 #undef CASE
1382 STBI_FREE(data);
1383 return good;
1386 #ifndef STBI_NO_LINEAR
1387 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1389 int i,k,n;
1390 float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1391 if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1392 // compute number of non-alpha components
1393 if (comp & 1) n = comp; else n = comp-1;
1394 for (i=0; i < x*y; ++i) {
1395 for (k=0; k < n; ++k) {
1396 output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1398 if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1400 STBI_FREE(data);
1401 return output;
1403 #endif
1405 #ifndef STBI_NO_HDR
1406 #define stbi__float2int(x) ((int) (x))
1407 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp)
1409 int i,k,n;
1410 stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1411 if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1412 // compute number of non-alpha components
1413 if (comp & 1) n = comp; else n = comp-1;
1414 for (i=0; i < x*y; ++i) {
1415 for (k=0; k < n; ++k) {
1416 float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1417 if (z < 0) z = 0;
1418 if (z > 255) z = 255;
1419 output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1421 if (k < comp) {
1422 float z = data[i*comp+k] * 255 + 0.5f;
1423 if (z < 0) z = 0;
1424 if (z > 255) z = 255;
1425 output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1428 STBI_FREE(data);
1429 return output;
1431 #endif
1433 //////////////////////////////////////////////////////////////////////////////
1435 // "baseline" JPEG/JFIF decoder
1437 // simple implementation
1438 // - doesn't support delayed output of y-dimension
1439 // - simple interface (only one output format: 8-bit interleaved RGB)
1440 // - doesn't try to recover corrupt jpegs
1441 // - doesn't allow partial loading, loading multiple at once
1442 // - still fast on x86 (copying globals into locals doesn't help x86)
1443 // - allocates lots of intermediate memory (full size of all components)
1444 // - non-interleaved case requires this anyway
1445 // - allows good upsampling (see next)
1446 // high-quality
1447 // - upsampled channels are bilinearly interpolated, even across blocks
1448 // - quality integer IDCT derived from IJG's 'slow'
1449 // performance
1450 // - fast huffman; reasonable integer IDCT
1451 // - some SIMD kernels for common paths on targets with SSE2/NEON
1452 // - uses a lot of intermediate memory, could cache poorly
1454 #ifndef STBI_NO_JPEG
1456 // huffman decoding acceleration
1457 #define FAST_BITS 9 // larger handles more cases; smaller stomps less cache
1459 typedef struct
1461 stbi_uc fast[1 << FAST_BITS];
1462 // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1463 stbi__uint16 code[256];
1464 stbi_uc values[256];
1465 stbi_uc size[257];
1466 unsigned int maxcode[18];
1467 int delta[17]; // old 'firstsymbol' - old 'firstcode'
1468 } stbi__huffman;
1470 typedef struct
1472 stbi__context *s;
1473 stbi__huffman huff_dc[4];
1474 stbi__huffman huff_ac[4];
1475 stbi_uc dequant[4][64];
1476 stbi__int16 fast_ac[4][1 << FAST_BITS];
1478 // sizes for components, interleaved MCUs
1479 int img_h_max, img_v_max;
1480 int img_mcu_x, img_mcu_y;
1481 int img_mcu_w, img_mcu_h;
1483 // definition of jpeg image component
1484 struct
1486 int id;
1487 int h,v;
1488 int tq;
1489 int hd,ha;
1490 int dc_pred;
1492 int x,y,w2,h2;
1493 stbi_uc *data;
1494 void *raw_data, *raw_coeff;
1495 stbi_uc *linebuf;
1496 short *coeff; // progressive only
1497 int coeff_w, coeff_h; // number of 8x8 coefficient blocks
1498 } img_comp[4];
1500 stbi__uint32 code_buffer; // jpeg entropy-coded buffer
1501 int code_bits; // number of valid bits
1502 unsigned char marker; // marker seen while filling entropy buffer
1503 int nomore; // flag if we saw a marker so must stop
1505 int progressive;
1506 int spec_start;
1507 int spec_end;
1508 int succ_high;
1509 int succ_low;
1510 int eob_run;
1512 int scan_n, order[4];
1513 int restart_interval, todo;
1515 // kernels
1516 void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1517 void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1518 stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1519 } stbi__jpeg;
1521 static int stbi__build_huffman(stbi__huffman *h, int *count)
1523 int i,j,k=0,code;
1524 // build size list for each symbol (from JPEG spec)
1525 for (i=0; i < 16; ++i)
1526 for (j=0; j < count[i]; ++j)
1527 h->size[k++] = (stbi_uc) (i+1);
1528 h->size[k] = 0;
1530 // compute actual symbols (from jpeg spec)
1531 code = 0;
1532 k = 0;
1533 for(j=1; j <= 16; ++j) {
1534 // compute delta to add to code to compute symbol id
1535 h->delta[j] = k - code;
1536 if (h->size[k] == j) {
1537 while (h->size[k] == j)
1538 h->code[k++] = (stbi__uint16) (code++);
1539 if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1541 // compute largest code + 1 for this size, preshifted as needed later
1542 h->maxcode[j] = code << (16-j);
1543 code <<= 1;
1545 h->maxcode[j] = 0xffffffff;
1547 // build non-spec acceleration table; 255 is flag for not-accelerated
1548 memset(h->fast, 255, 1 << FAST_BITS);
1549 for (i=0; i < k; ++i) {
1550 int s = h->size[i];
1551 if (s <= FAST_BITS) {
1552 int c = h->code[i] << (FAST_BITS-s);
1553 int m = 1 << (FAST_BITS-s);
1554 for (j=0; j < m; ++j) {
1555 h->fast[c+j] = (stbi_uc) i;
1559 return 1;
1562 // build a table that decodes both magnitude and value of small ACs in
1563 // one go.
1564 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1566 int i;
1567 for (i=0; i < (1 << FAST_BITS); ++i) {
1568 stbi_uc fast = h->fast[i];
1569 fast_ac[i] = 0;
1570 if (fast < 255) {
1571 int rs = h->values[fast];
1572 int run = (rs >> 4) & 15;
1573 int magbits = rs & 15;
1574 int len = h->size[fast];
1576 if (magbits && len + magbits <= FAST_BITS) {
1577 // magnitude code followed by receive_extend code
1578 int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1579 int m = 1 << (magbits - 1);
1580 if (k < m) k += (-1 << magbits) + 1;
1581 // if the result is small enough, we can fit it in fast_ac table
1582 if (k >= -128 && k <= 127)
1583 fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1589 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1591 do {
1592 int b = j->nomore ? 0 : stbi__get8(j->s);
1593 if (b == 0xff) {
1594 int c = stbi__get8(j->s);
1595 if (c != 0) {
1596 j->marker = (unsigned char) c;
1597 j->nomore = 1;
1598 return;
1601 j->code_buffer |= b << (24 - j->code_bits);
1602 j->code_bits += 8;
1603 } while (j->code_bits <= 24);
1606 // (1 << n) - 1
1607 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1609 // decode a jpeg huffman value from the bitstream
1610 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1612 unsigned int temp;
1613 int c,k;
1615 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1617 // look at the top FAST_BITS and determine what symbol ID it is,
1618 // if the code is <= FAST_BITS
1619 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1620 k = h->fast[c];
1621 if (k < 255) {
1622 int s = h->size[k];
1623 if (s > j->code_bits)
1624 return -1;
1625 j->code_buffer <<= s;
1626 j->code_bits -= s;
1627 return h->values[k];
1630 // naive test is to shift the code_buffer down so k bits are
1631 // valid, then test against maxcode. To speed this up, we've
1632 // preshifted maxcode left so that it has (16-k) 0s at the
1633 // end; in other words, regardless of the number of bits, it
1634 // wants to be compared against something shifted to have 16;
1635 // that way we don't need to shift inside the loop.
1636 temp = j->code_buffer >> 16;
1637 for (k=FAST_BITS+1 ; ; ++k)
1638 if (temp < h->maxcode[k])
1639 break;
1640 if (k == 17) {
1641 // error! code not found
1642 j->code_bits -= 16;
1643 return -1;
1646 if (k > j->code_bits)
1647 return -1;
1649 // convert the huffman code to the symbol id
1650 c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1651 STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1653 // convert the id to a symbol
1654 j->code_bits -= k;
1655 j->code_buffer <<= k;
1656 return h->values[c];
1659 // bias[n] = (-1<<n) + 1
1660 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1662 // combined JPEG 'receive' and JPEG 'extend', since baseline
1663 // always extends everything it receives.
1664 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1666 unsigned int k;
1667 int sgn;
1668 if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1670 sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1671 k = stbi_lrot(j->code_buffer, n);
1672 STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
1673 j->code_buffer = k & ~stbi__bmask[n];
1674 k &= stbi__bmask[n];
1675 j->code_bits -= n;
1676 return k + (stbi__jbias[n] & ~sgn);
1679 // get some unsigned bits
1680 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1682 unsigned int k;
1683 if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1684 k = stbi_lrot(j->code_buffer, n);
1685 j->code_buffer = k & ~stbi__bmask[n];
1686 k &= stbi__bmask[n];
1687 j->code_bits -= n;
1688 return k;
1691 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1693 unsigned int k;
1694 if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1695 k = j->code_buffer;
1696 j->code_buffer <<= 1;
1697 --j->code_bits;
1698 return k & 0x80000000;
1701 // given a value that's at position X in the zigzag stream,
1702 // where does it appear in the 8x8 matrix coded as row-major?
1703 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1705 0, 1, 8, 16, 9, 2, 3, 10,
1706 17, 24, 32, 25, 18, 11, 4, 5,
1707 12, 19, 26, 33, 40, 48, 41, 34,
1708 27, 20, 13, 6, 7, 14, 21, 28,
1709 35, 42, 49, 56, 57, 50, 43, 36,
1710 29, 22, 15, 23, 30, 37, 44, 51,
1711 58, 59, 52, 45, 38, 31, 39, 46,
1712 53, 60, 61, 54, 47, 55, 62, 63,
1713 // let corrupt input sample past end
1714 63, 63, 63, 63, 63, 63, 63, 63,
1715 63, 63, 63, 63, 63, 63, 63
1718 // decode one 64-entry block--
1719 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1721 int diff,dc,k;
1722 int t;
1724 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1725 t = stbi__jpeg_huff_decode(j, hdc);
1726 if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1728 // 0 all the ac values now so we can do it 32-bits at a time
1729 memset(data,0,64*sizeof(data[0]));
1731 diff = t ? stbi__extend_receive(j, t) : 0;
1732 dc = j->img_comp[b].dc_pred + diff;
1733 j->img_comp[b].dc_pred = dc;
1734 data[0] = (short) (dc * dequant[0]);
1736 // decode AC components, see JPEG spec
1737 k = 1;
1738 do {
1739 unsigned int zig;
1740 int c,r,s;
1741 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1742 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1743 r = fac[c];
1744 if (r) { // fast-AC path
1745 k += (r >> 4) & 15; // run
1746 s = r & 15; // combined length
1747 j->code_buffer <<= s;
1748 j->code_bits -= s;
1749 // decode into unzigzag'd location
1750 zig = stbi__jpeg_dezigzag[k++];
1751 data[zig] = (short) ((r >> 8) * dequant[zig]);
1752 } else {
1753 int rs = stbi__jpeg_huff_decode(j, hac);
1754 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1755 s = rs & 15;
1756 r = rs >> 4;
1757 if (s == 0) {
1758 if (rs != 0xf0) break; // end block
1759 k += 16;
1760 } else {
1761 k += r;
1762 // decode into unzigzag'd location
1763 zig = stbi__jpeg_dezigzag[k++];
1764 data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1767 } while (k < 64);
1768 return 1;
1771 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1773 int diff,dc;
1774 int t;
1775 if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1777 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1779 if (j->succ_high == 0) {
1780 // first scan for DC coefficient, must be first
1781 memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1782 t = stbi__jpeg_huff_decode(j, hdc);
1783 diff = t ? stbi__extend_receive(j, t) : 0;
1785 dc = j->img_comp[b].dc_pred + diff;
1786 j->img_comp[b].dc_pred = dc;
1787 data[0] = (short) (dc << j->succ_low);
1788 } else {
1789 // refinement scan for DC coefficient
1790 if (stbi__jpeg_get_bit(j))
1791 data[0] += (short) (1 << j->succ_low);
1793 return 1;
1796 // @OPTIMIZE: store non-zigzagged during the decode passes,
1797 // and only de-zigzag when dequantizing
1798 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1800 int k;
1801 if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1803 if (j->succ_high == 0) {
1804 int shift = j->succ_low;
1806 if (j->eob_run) {
1807 --j->eob_run;
1808 return 1;
1811 k = j->spec_start;
1812 do {
1813 unsigned int zig;
1814 int c,r,s;
1815 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1816 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1817 r = fac[c];
1818 if (r) { // fast-AC path
1819 k += (r >> 4) & 15; // run
1820 s = r & 15; // combined length
1821 j->code_buffer <<= s;
1822 j->code_bits -= s;
1823 zig = stbi__jpeg_dezigzag[k++];
1824 data[zig] = (short) ((r >> 8) << shift);
1825 } else {
1826 int rs = stbi__jpeg_huff_decode(j, hac);
1827 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1828 s = rs & 15;
1829 r = rs >> 4;
1830 if (s == 0) {
1831 if (r < 15) {
1832 j->eob_run = (1 << r);
1833 if (r)
1834 j->eob_run += stbi__jpeg_get_bits(j, r);
1835 --j->eob_run;
1836 break;
1838 k += 16;
1839 } else {
1840 k += r;
1841 zig = stbi__jpeg_dezigzag[k++];
1842 data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1845 } while (k <= j->spec_end);
1846 } else {
1847 // refinement scan for these AC coefficients
1849 short bit = (short) (1 << j->succ_low);
1851 if (j->eob_run) {
1852 --j->eob_run;
1853 for (k = j->spec_start; k <= j->spec_end; ++k) {
1854 short *p = &data[stbi__jpeg_dezigzag[k]];
1855 if (*p != 0)
1856 if (stbi__jpeg_get_bit(j))
1857 if ((*p & bit)==0) {
1858 if (*p > 0)
1859 *p += bit;
1860 else
1861 *p -= bit;
1864 } else {
1865 k = j->spec_start;
1866 do {
1867 int r,s;
1868 int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1869 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1870 s = rs & 15;
1871 r = rs >> 4;
1872 if (s == 0) {
1873 if (r < 15) {
1874 j->eob_run = (1 << r) - 1;
1875 if (r)
1876 j->eob_run += stbi__jpeg_get_bits(j, r);
1877 r = 64; // force end of block
1878 } else {
1879 // r=15 s=0 should write 16 0s, so we just do
1880 // a run of 15 0s and then write s (which is 0),
1881 // so we don't have to do anything special here
1883 } else {
1884 if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1885 // sign bit
1886 if (stbi__jpeg_get_bit(j))
1887 s = bit;
1888 else
1889 s = -bit;
1892 // advance by r
1893 while (k <= j->spec_end) {
1894 short *p = &data[stbi__jpeg_dezigzag[k++]];
1895 if (*p != 0) {
1896 if (stbi__jpeg_get_bit(j))
1897 if ((*p & bit)==0) {
1898 if (*p > 0)
1899 *p += bit;
1900 else
1901 *p -= bit;
1903 } else {
1904 if (r == 0) {
1905 *p = (short) s;
1906 break;
1908 --r;
1911 } while (k <= j->spec_end);
1914 return 1;
1917 // take a -128..127 value and stbi__clamp it and convert to 0..255
1918 stbi_inline static stbi_uc stbi__clamp(int x)
1920 // trick to use a single test to catch both cases
1921 if ((unsigned int) x > 255) {
1922 if (x < 0) return 0;
1923 if (x > 255) return 255;
1925 return (stbi_uc) x;
1928 #define stbi__f2f(x) ((int) (((x) * 4096 + 0.5)))
1929 #define stbi__fsh(x) ((x) << 12)
1931 // derived from jidctint -- DCT_ISLOW
1932 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1933 int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1934 p2 = s2; \
1935 p3 = s6; \
1936 p1 = (p2+p3) * stbi__f2f(0.5411961f); \
1937 t2 = p1 + p3*stbi__f2f(-1.847759065f); \
1938 t3 = p1 + p2*stbi__f2f( 0.765366865f); \
1939 p2 = s0; \
1940 p3 = s4; \
1941 t0 = stbi__fsh(p2+p3); \
1942 t1 = stbi__fsh(p2-p3); \
1943 x0 = t0+t3; \
1944 x3 = t0-t3; \
1945 x1 = t1+t2; \
1946 x2 = t1-t2; \
1947 t0 = s7; \
1948 t1 = s5; \
1949 t2 = s3; \
1950 t3 = s1; \
1951 p3 = t0+t2; \
1952 p4 = t1+t3; \
1953 p1 = t0+t3; \
1954 p2 = t1+t2; \
1955 p5 = (p3+p4)*stbi__f2f( 1.175875602f); \
1956 t0 = t0*stbi__f2f( 0.298631336f); \
1957 t1 = t1*stbi__f2f( 2.053119869f); \
1958 t2 = t2*stbi__f2f( 3.072711026f); \
1959 t3 = t3*stbi__f2f( 1.501321110f); \
1960 p1 = p5 + p1*stbi__f2f(-0.899976223f); \
1961 p2 = p5 + p2*stbi__f2f(-2.562915447f); \
1962 p3 = p3*stbi__f2f(-1.961570560f); \
1963 p4 = p4*stbi__f2f(-0.390180644f); \
1964 t3 += p1+p4; \
1965 t2 += p2+p3; \
1966 t1 += p2+p4; \
1967 t0 += p1+p3;
1969 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
1971 int i,val[64],*v=val;
1972 stbi_uc *o;
1973 short *d = data;
1975 // columns
1976 for (i=0; i < 8; ++i,++d, ++v) {
1977 // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
1978 if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
1979 && d[40]==0 && d[48]==0 && d[56]==0) {
1980 // no shortcut 0 seconds
1981 // (1|2|3|4|5|6|7)==0 0 seconds
1982 // all separate -0.047 seconds
1983 // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds
1984 int dcterm = d[0] << 2;
1985 v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
1986 } else {
1987 STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
1988 // constants scaled things up by 1<<12; let's bring them back
1989 // down, but keep 2 extra bits of precision
1990 x0 += 512; x1 += 512; x2 += 512; x3 += 512;
1991 v[ 0] = (x0+t3) >> 10;
1992 v[56] = (x0-t3) >> 10;
1993 v[ 8] = (x1+t2) >> 10;
1994 v[48] = (x1-t2) >> 10;
1995 v[16] = (x2+t1) >> 10;
1996 v[40] = (x2-t1) >> 10;
1997 v[24] = (x3+t0) >> 10;
1998 v[32] = (x3-t0) >> 10;
2002 for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2003 // no fast case since the first 1D IDCT spread components out
2004 STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2005 // constants scaled things up by 1<<12, plus we had 1<<2 from first
2006 // loop, plus horizontal and vertical each scale by sqrt(8) so together
2007 // we've got an extra 1<<3, so 1<<17 total we need to remove.
2008 // so we want to round that, which means adding 0.5 * 1<<17,
2009 // aka 65536. Also, we'll end up with -128 to 127 that we want
2010 // to encode as 0..255 by adding 128, so we'll add that before the shift
2011 x0 += 65536 + (128<<17);
2012 x1 += 65536 + (128<<17);
2013 x2 += 65536 + (128<<17);
2014 x3 += 65536 + (128<<17);
2015 // tried computing the shifts into temps, or'ing the temps to see
2016 // if any were out of range, but that was slower
2017 o[0] = stbi__clamp((x0+t3) >> 17);
2018 o[7] = stbi__clamp((x0-t3) >> 17);
2019 o[1] = stbi__clamp((x1+t2) >> 17);
2020 o[6] = stbi__clamp((x1-t2) >> 17);
2021 o[2] = stbi__clamp((x2+t1) >> 17);
2022 o[5] = stbi__clamp((x2-t1) >> 17);
2023 o[3] = stbi__clamp((x3+t0) >> 17);
2024 o[4] = stbi__clamp((x3-t0) >> 17);
2028 #ifdef STBI_SSE2
2029 // sse2 integer IDCT. not the fastest possible implementation but it
2030 // produces bit-identical results to the generic C version so it's
2031 // fully "transparent".
2032 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2034 // This is constructed to match our regular (generic) integer IDCT exactly.
2035 __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2036 __m128i tmp;
2038 // dot product constant: even elems=x, odd elems=y
2039 #define dct_const(x,y) _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2041 // out(0) = c0[even]*x + c0[odd]*y (c0, x, y 16-bit, out 32-bit)
2042 // out(1) = c1[even]*x + c1[odd]*y
2043 #define dct_rot(out0,out1, x,y,c0,c1) \
2044 __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2045 __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2046 __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2047 __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2048 __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2049 __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2051 // out = in << 12 (in 16-bit, out 32-bit)
2052 #define dct_widen(out, in) \
2053 __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2054 __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2056 // wide add
2057 #define dct_wadd(out, a, b) \
2058 __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2059 __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2061 // wide sub
2062 #define dct_wsub(out, a, b) \
2063 __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2064 __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2066 // butterfly a/b, add bias, then shift by "s" and pack
2067 #define dct_bfly32o(out0, out1, a,b,bias,s) \
2069 __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2070 __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2071 dct_wadd(sum, abiased, b); \
2072 dct_wsub(dif, abiased, b); \
2073 out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2074 out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2077 // 8-bit interleave step (for transposes)
2078 #define dct_interleave8(a, b) \
2079 tmp = a; \
2080 a = _mm_unpacklo_epi8(a, b); \
2081 b = _mm_unpackhi_epi8(tmp, b)
2083 // 16-bit interleave step (for transposes)
2084 #define dct_interleave16(a, b) \
2085 tmp = a; \
2086 a = _mm_unpacklo_epi16(a, b); \
2087 b = _mm_unpackhi_epi16(tmp, b)
2089 #define dct_pass(bias,shift) \
2091 /* even part */ \
2092 dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2093 __m128i sum04 = _mm_add_epi16(row0, row4); \
2094 __m128i dif04 = _mm_sub_epi16(row0, row4); \
2095 dct_widen(t0e, sum04); \
2096 dct_widen(t1e, dif04); \
2097 dct_wadd(x0, t0e, t3e); \
2098 dct_wsub(x3, t0e, t3e); \
2099 dct_wadd(x1, t1e, t2e); \
2100 dct_wsub(x2, t1e, t2e); \
2101 /* odd part */ \
2102 dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2103 dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2104 __m128i sum17 = _mm_add_epi16(row1, row7); \
2105 __m128i sum35 = _mm_add_epi16(row3, row5); \
2106 dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2107 dct_wadd(x4, y0o, y4o); \
2108 dct_wadd(x5, y1o, y5o); \
2109 dct_wadd(x6, y2o, y5o); \
2110 dct_wadd(x7, y3o, y4o); \
2111 dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2112 dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2113 dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2114 dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2117 __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2118 __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2119 __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2120 __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2121 __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2122 __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2123 __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2124 __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2126 // rounding biases in column/row passes, see stbi__idct_block for explanation.
2127 __m128i bias_0 = _mm_set1_epi32(512);
2128 __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2130 // load
2131 row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2132 row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2133 row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2134 row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2135 row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2136 row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2137 row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2138 row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2140 // column pass
2141 dct_pass(bias_0, 10);
2144 // 16bit 8x8 transpose pass 1
2145 dct_interleave16(row0, row4);
2146 dct_interleave16(row1, row5);
2147 dct_interleave16(row2, row6);
2148 dct_interleave16(row3, row7);
2150 // transpose pass 2
2151 dct_interleave16(row0, row2);
2152 dct_interleave16(row1, row3);
2153 dct_interleave16(row4, row6);
2154 dct_interleave16(row5, row7);
2156 // transpose pass 3
2157 dct_interleave16(row0, row1);
2158 dct_interleave16(row2, row3);
2159 dct_interleave16(row4, row5);
2160 dct_interleave16(row6, row7);
2163 // row pass
2164 dct_pass(bias_1, 17);
2167 // pack
2168 __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2169 __m128i p1 = _mm_packus_epi16(row2, row3);
2170 __m128i p2 = _mm_packus_epi16(row4, row5);
2171 __m128i p3 = _mm_packus_epi16(row6, row7);
2173 // 8bit 8x8 transpose pass 1
2174 dct_interleave8(p0, p2); // a0e0a1e1...
2175 dct_interleave8(p1, p3); // c0g0c1g1...
2177 // transpose pass 2
2178 dct_interleave8(p0, p1); // a0c0e0g0...
2179 dct_interleave8(p2, p3); // b0d0f0h0...
2181 // transpose pass 3
2182 dct_interleave8(p0, p2); // a0b0c0d0...
2183 dct_interleave8(p1, p3); // a4b4c4d4...
2185 // store
2186 _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2187 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2188 _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2189 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2190 _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2191 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2192 _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2193 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2196 #undef dct_const
2197 #undef dct_rot
2198 #undef dct_widen
2199 #undef dct_wadd
2200 #undef dct_wsub
2201 #undef dct_bfly32o
2202 #undef dct_interleave8
2203 #undef dct_interleave16
2204 #undef dct_pass
2207 #endif // STBI_SSE2
2209 #ifdef STBI_NEON
2211 // NEON integer IDCT. should produce bit-identical
2212 // results to the generic C version.
2213 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2215 int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2217 int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2218 int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2219 int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2220 int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2221 int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2222 int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2223 int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2224 int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2225 int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2226 int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2227 int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2228 int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2230 #define dct_long_mul(out, inq, coeff) \
2231 int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2232 int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2234 #define dct_long_mac(out, acc, inq, coeff) \
2235 int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2236 int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2238 #define dct_widen(out, inq) \
2239 int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2240 int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2242 // wide add
2243 #define dct_wadd(out, a, b) \
2244 int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2245 int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2247 // wide sub
2248 #define dct_wsub(out, a, b) \
2249 int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2250 int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2252 // butterfly a/b, then shift using "shiftop" by "s" and pack
2253 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2255 dct_wadd(sum, a, b); \
2256 dct_wsub(dif, a, b); \
2257 out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2258 out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2261 #define dct_pass(shiftop, shift) \
2263 /* even part */ \
2264 int16x8_t sum26 = vaddq_s16(row2, row6); \
2265 dct_long_mul(p1e, sum26, rot0_0); \
2266 dct_long_mac(t2e, p1e, row6, rot0_1); \
2267 dct_long_mac(t3e, p1e, row2, rot0_2); \
2268 int16x8_t sum04 = vaddq_s16(row0, row4); \
2269 int16x8_t dif04 = vsubq_s16(row0, row4); \
2270 dct_widen(t0e, sum04); \
2271 dct_widen(t1e, dif04); \
2272 dct_wadd(x0, t0e, t3e); \
2273 dct_wsub(x3, t0e, t3e); \
2274 dct_wadd(x1, t1e, t2e); \
2275 dct_wsub(x2, t1e, t2e); \
2276 /* odd part */ \
2277 int16x8_t sum15 = vaddq_s16(row1, row5); \
2278 int16x8_t sum17 = vaddq_s16(row1, row7); \
2279 int16x8_t sum35 = vaddq_s16(row3, row5); \
2280 int16x8_t sum37 = vaddq_s16(row3, row7); \
2281 int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2282 dct_long_mul(p5o, sumodd, rot1_0); \
2283 dct_long_mac(p1o, p5o, sum17, rot1_1); \
2284 dct_long_mac(p2o, p5o, sum35, rot1_2); \
2285 dct_long_mul(p3o, sum37, rot2_0); \
2286 dct_long_mul(p4o, sum15, rot2_1); \
2287 dct_wadd(sump13o, p1o, p3o); \
2288 dct_wadd(sump24o, p2o, p4o); \
2289 dct_wadd(sump23o, p2o, p3o); \
2290 dct_wadd(sump14o, p1o, p4o); \
2291 dct_long_mac(x4, sump13o, row7, rot3_0); \
2292 dct_long_mac(x5, sump24o, row5, rot3_1); \
2293 dct_long_mac(x6, sump23o, row3, rot3_2); \
2294 dct_long_mac(x7, sump14o, row1, rot3_3); \
2295 dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2296 dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2297 dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2298 dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2301 // load
2302 row0 = vld1q_s16(data + 0*8);
2303 row1 = vld1q_s16(data + 1*8);
2304 row2 = vld1q_s16(data + 2*8);
2305 row3 = vld1q_s16(data + 3*8);
2306 row4 = vld1q_s16(data + 4*8);
2307 row5 = vld1q_s16(data + 5*8);
2308 row6 = vld1q_s16(data + 6*8);
2309 row7 = vld1q_s16(data + 7*8);
2311 // add DC bias
2312 row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2314 // column pass
2315 dct_pass(vrshrn_n_s32, 10);
2317 // 16bit 8x8 transpose
2319 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2320 // whether compilers actually get this is another story, sadly.
2321 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2322 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2323 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2325 // pass 1
2326 dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2327 dct_trn16(row2, row3);
2328 dct_trn16(row4, row5);
2329 dct_trn16(row6, row7);
2331 // pass 2
2332 dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2333 dct_trn32(row1, row3);
2334 dct_trn32(row4, row6);
2335 dct_trn32(row5, row7);
2337 // pass 3
2338 dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2339 dct_trn64(row1, row5);
2340 dct_trn64(row2, row6);
2341 dct_trn64(row3, row7);
2343 #undef dct_trn16
2344 #undef dct_trn32
2345 #undef dct_trn64
2348 // row pass
2349 // vrshrn_n_s32 only supports shifts up to 16, we need
2350 // 17. so do a non-rounding shift of 16 first then follow
2351 // up with a rounding shift by 1.
2352 dct_pass(vshrn_n_s32, 16);
2355 // pack and round
2356 uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2357 uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2358 uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2359 uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2360 uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2361 uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2362 uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2363 uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2365 // again, these can translate into one instruction, but often don't.
2366 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2367 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2368 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2370 // sadly can't use interleaved stores here since we only write
2371 // 8 bytes to each scan line!
2373 // 8x8 8-bit transpose pass 1
2374 dct_trn8_8(p0, p1);
2375 dct_trn8_8(p2, p3);
2376 dct_trn8_8(p4, p5);
2377 dct_trn8_8(p6, p7);
2379 // pass 2
2380 dct_trn8_16(p0, p2);
2381 dct_trn8_16(p1, p3);
2382 dct_trn8_16(p4, p6);
2383 dct_trn8_16(p5, p7);
2385 // pass 3
2386 dct_trn8_32(p0, p4);
2387 dct_trn8_32(p1, p5);
2388 dct_trn8_32(p2, p6);
2389 dct_trn8_32(p3, p7);
2391 // store
2392 vst1_u8(out, p0); out += out_stride;
2393 vst1_u8(out, p1); out += out_stride;
2394 vst1_u8(out, p2); out += out_stride;
2395 vst1_u8(out, p3); out += out_stride;
2396 vst1_u8(out, p4); out += out_stride;
2397 vst1_u8(out, p5); out += out_stride;
2398 vst1_u8(out, p6); out += out_stride;
2399 vst1_u8(out, p7);
2401 #undef dct_trn8_8
2402 #undef dct_trn8_16
2403 #undef dct_trn8_32
2406 #undef dct_long_mul
2407 #undef dct_long_mac
2408 #undef dct_widen
2409 #undef dct_wadd
2410 #undef dct_wsub
2411 #undef dct_bfly32o
2412 #undef dct_pass
2415 #endif // STBI_NEON
2417 #define STBI__MARKER_none 0xff
2418 // if there's a pending marker from the entropy stream, return that
2419 // otherwise, fetch from the stream and get a marker. if there's no
2420 // marker, return 0xff, which is never a valid marker value
2421 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2423 stbi_uc x;
2424 if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2425 x = stbi__get8(j->s);
2426 if (x != 0xff) return STBI__MARKER_none;
2427 while (x == 0xff)
2428 x = stbi__get8(j->s);
2429 return x;
2432 // in each scan, we'll have scan_n components, and the order
2433 // of the components is specified by order[]
2434 #define STBI__RESTART(x) ((x) >= 0xd0 && (x) <= 0xd7)
2436 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2437 // the dc prediction
2438 static void stbi__jpeg_reset(stbi__jpeg *j)
2440 j->code_bits = 0;
2441 j->code_buffer = 0;
2442 j->nomore = 0;
2443 j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2444 j->marker = STBI__MARKER_none;
2445 j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2446 j->eob_run = 0;
2447 // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2448 // since we don't even allow 1<<30 pixels
2451 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2453 stbi__jpeg_reset(z);
2454 if (!z->progressive) {
2455 if (z->scan_n == 1) {
2456 int i,j;
2457 STBI_SIMD_ALIGN(short, data[64]);
2458 int n = z->order[0];
2459 // non-interleaved data, we just need to process one block at a time,
2460 // in trivial scanline order
2461 // number of blocks to do just depends on how many actual "pixels" this
2462 // component has, independent of interleaved MCU blocking and such
2463 int w = (z->img_comp[n].x+7) >> 3;
2464 int h = (z->img_comp[n].y+7) >> 3;
2465 for (j=0; j < h; ++j) {
2466 for (i=0; i < w; ++i) {
2467 int ha = z->img_comp[n].ha;
2468 if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2469 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2470 // every data block is an MCU, so countdown the restart interval
2471 if (--z->todo <= 0) {
2472 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2473 // if it's NOT a restart, then just bail, so we get corrupt data
2474 // rather than no data
2475 if (!STBI__RESTART(z->marker)) return 1;
2476 stbi__jpeg_reset(z);
2480 return 1;
2481 } else { // interleaved
2482 int i,j,k,x,y;
2483 STBI_SIMD_ALIGN(short, data[64]);
2484 for (j=0; j < z->img_mcu_y; ++j) {
2485 for (i=0; i < z->img_mcu_x; ++i) {
2486 // scan an interleaved mcu... process scan_n components in order
2487 for (k=0; k < z->scan_n; ++k) {
2488 int n = z->order[k];
2489 // scan out an mcu's worth of this component; that's just determined
2490 // by the basic H and V specified for the component
2491 for (y=0; y < z->img_comp[n].v; ++y) {
2492 for (x=0; x < z->img_comp[n].h; ++x) {
2493 int x2 = (i*z->img_comp[n].h + x)*8;
2494 int y2 = (j*z->img_comp[n].v + y)*8;
2495 int ha = z->img_comp[n].ha;
2496 if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2497 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2501 // after all interleaved components, that's an interleaved MCU,
2502 // so now count down the restart interval
2503 if (--z->todo <= 0) {
2504 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2505 if (!STBI__RESTART(z->marker)) return 1;
2506 stbi__jpeg_reset(z);
2510 return 1;
2512 } else {
2513 if (z->scan_n == 1) {
2514 int i,j;
2515 int n = z->order[0];
2516 // non-interleaved data, we just need to process one block at a time,
2517 // in trivial scanline order
2518 // number of blocks to do just depends on how many actual "pixels" this
2519 // component has, independent of interleaved MCU blocking and such
2520 int w = (z->img_comp[n].x+7) >> 3;
2521 int h = (z->img_comp[n].y+7) >> 3;
2522 for (j=0; j < h; ++j) {
2523 for (i=0; i < w; ++i) {
2524 short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2525 if (z->spec_start == 0) {
2526 if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2527 return 0;
2528 } else {
2529 int ha = z->img_comp[n].ha;
2530 if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2531 return 0;
2533 // every data block is an MCU, so countdown the restart interval
2534 if (--z->todo <= 0) {
2535 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2536 if (!STBI__RESTART(z->marker)) return 1;
2537 stbi__jpeg_reset(z);
2541 return 1;
2542 } else { // interleaved
2543 int i,j,k,x,y;
2544 for (j=0; j < z->img_mcu_y; ++j) {
2545 for (i=0; i < z->img_mcu_x; ++i) {
2546 // scan an interleaved mcu... process scan_n components in order
2547 for (k=0; k < z->scan_n; ++k) {
2548 int n = z->order[k];
2549 // scan out an mcu's worth of this component; that's just determined
2550 // by the basic H and V specified for the component
2551 for (y=0; y < z->img_comp[n].v; ++y) {
2552 for (x=0; x < z->img_comp[n].h; ++x) {
2553 int x2 = (i*z->img_comp[n].h + x);
2554 int y2 = (j*z->img_comp[n].v + y);
2555 short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2556 if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2557 return 0;
2561 // after all interleaved components, that's an interleaved MCU,
2562 // so now count down the restart interval
2563 if (--z->todo <= 0) {
2564 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2565 if (!STBI__RESTART(z->marker)) return 1;
2566 stbi__jpeg_reset(z);
2570 return 1;
2575 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2577 int i;
2578 for (i=0; i < 64; ++i)
2579 data[i] *= dequant[i];
2582 static void stbi__jpeg_finish(stbi__jpeg *z)
2584 if (z->progressive) {
2585 // dequantize and idct the data
2586 int i,j,n;
2587 for (n=0; n < z->s->img_n; ++n) {
2588 int w = (z->img_comp[n].x+7) >> 3;
2589 int h = (z->img_comp[n].y+7) >> 3;
2590 for (j=0; j < h; ++j) {
2591 for (i=0; i < w; ++i) {
2592 short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2593 stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2594 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2601 static int stbi__process_marker(stbi__jpeg *z, int m)
2603 int L;
2604 switch (m) {
2605 case STBI__MARKER_none: // no marker found
2606 return stbi__err("expected marker","Corrupt JPEG");
2608 case 0xDD: // DRI - specify restart interval
2609 if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2610 z->restart_interval = stbi__get16be(z->s);
2611 return 1;
2613 case 0xDB: // DQT - define quantization table
2614 L = stbi__get16be(z->s)-2;
2615 while (L > 0) {
2616 int q = stbi__get8(z->s);
2617 int p = q >> 4;
2618 int t = q & 15,i;
2619 if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2620 if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2621 for (i=0; i < 64; ++i)
2622 z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2623 L -= 65;
2625 return L==0;
2627 case 0xC4: // DHT - define huffman table
2628 L = stbi__get16be(z->s)-2;
2629 while (L > 0) {
2630 stbi_uc *v;
2631 int sizes[16],i,n=0;
2632 int q = stbi__get8(z->s);
2633 int tc = q >> 4;
2634 int th = q & 15;
2635 if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2636 for (i=0; i < 16; ++i) {
2637 sizes[i] = stbi__get8(z->s);
2638 n += sizes[i];
2640 L -= 17;
2641 if (tc == 0) {
2642 if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2643 v = z->huff_dc[th].values;
2644 } else {
2645 if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2646 v = z->huff_ac[th].values;
2648 for (i=0; i < n; ++i)
2649 v[i] = stbi__get8(z->s);
2650 if (tc != 0)
2651 stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2652 L -= n;
2654 return L==0;
2656 // check for comment block or APP blocks
2657 if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2658 stbi__skip(z->s, stbi__get16be(z->s)-2);
2659 return 1;
2661 return 0;
2664 // after we see SOS
2665 static int stbi__process_scan_header(stbi__jpeg *z)
2667 int i;
2668 int Ls = stbi__get16be(z->s);
2669 z->scan_n = stbi__get8(z->s);
2670 if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2671 if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2672 for (i=0; i < z->scan_n; ++i) {
2673 int id = stbi__get8(z->s), which;
2674 int q = stbi__get8(z->s);
2675 for (which = 0; which < z->s->img_n; ++which)
2676 if (z->img_comp[which].id == id)
2677 break;
2678 if (which == z->s->img_n) return 0; // no match
2679 z->img_comp[which].hd = q >> 4; if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2680 z->img_comp[which].ha = q & 15; if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2681 z->order[i] = which;
2685 int aa;
2686 z->spec_start = stbi__get8(z->s);
2687 z->spec_end = stbi__get8(z->s); // should be 63, but might be 0
2688 aa = stbi__get8(z->s);
2689 z->succ_high = (aa >> 4);
2690 z->succ_low = (aa & 15);
2691 if (z->progressive) {
2692 if (z->spec_start > 63 || z->spec_end > 63 || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2693 return stbi__err("bad SOS", "Corrupt JPEG");
2694 } else {
2695 if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2696 if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2697 z->spec_end = 63;
2701 return 1;
2704 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2706 stbi__context *s = z->s;
2707 int Lf,p,i,q, h_max=1,v_max=1,c;
2708 Lf = stbi__get16be(s); if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2709 p = stbi__get8(s); if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2710 s->img_y = stbi__get16be(s); if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2711 s->img_x = stbi__get16be(s); if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2712 c = stbi__get8(s);
2713 if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG"); // JFIF requires
2714 s->img_n = c;
2715 for (i=0; i < c; ++i) {
2716 z->img_comp[i].data = NULL;
2717 z->img_comp[i].linebuf = NULL;
2720 if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2722 for (i=0; i < s->img_n; ++i) {
2723 z->img_comp[i].id = stbi__get8(s);
2724 if (z->img_comp[i].id != i+1) // JFIF requires
2725 if (z->img_comp[i].id != i) // some version of jpegtran outputs non-JFIF-compliant files!
2726 return stbi__err("bad component ID","Corrupt JPEG");
2727 q = stbi__get8(s);
2728 z->img_comp[i].h = (q >> 4); if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2729 z->img_comp[i].v = q & 15; if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2730 z->img_comp[i].tq = stbi__get8(s); if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2733 if (scan != STBI__SCAN_load) return 1;
2735 if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2737 for (i=0; i < s->img_n; ++i) {
2738 if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2739 if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2742 // compute interleaved mcu info
2743 z->img_h_max = h_max;
2744 z->img_v_max = v_max;
2745 z->img_mcu_w = h_max * 8;
2746 z->img_mcu_h = v_max * 8;
2747 z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2748 z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2750 for (i=0; i < s->img_n; ++i) {
2751 // number of effective pixels (e.g. for non-interleaved MCU)
2752 z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2753 z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2754 // to simplify generation, we'll allocate enough memory to decode
2755 // the bogus oversized data from using interleaved MCUs and their
2756 // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2757 // discard the extra data until colorspace conversion
2758 z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2759 z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2760 z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2762 if (z->img_comp[i].raw_data == NULL) {
2763 for(--i; i >= 0; --i) {
2764 STBI_FREE(z->img_comp[i].raw_data);
2765 z->img_comp[i].raw_data = NULL;
2767 return stbi__err("outofmem", "Out of memory");
2769 // align blocks for idct using mmx/sse
2770 z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2771 z->img_comp[i].linebuf = NULL;
2772 if (z->progressive) {
2773 z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2774 z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2775 z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2776 z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2777 } else {
2778 z->img_comp[i].coeff = 0;
2779 z->img_comp[i].raw_coeff = 0;
2783 return 1;
2786 // use comparisons since in some cases we handle more than one case (e.g. SOF)
2787 #define stbi__DNL(x) ((x) == 0xdc)
2788 #define stbi__SOI(x) ((x) == 0xd8)
2789 #define stbi__EOI(x) ((x) == 0xd9)
2790 #define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2791 #define stbi__SOS(x) ((x) == 0xda)
2793 #define stbi__SOF_progressive(x) ((x) == 0xc2)
2795 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2797 int m;
2798 z->marker = STBI__MARKER_none; // initialize cached marker to empty
2799 m = stbi__get_marker(z);
2800 if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2801 if (scan == STBI__SCAN_type) return 1;
2802 m = stbi__get_marker(z);
2803 while (!stbi__SOF(m)) {
2804 if (!stbi__process_marker(z,m)) return 0;
2805 m = stbi__get_marker(z);
2806 while (m == STBI__MARKER_none) {
2807 // some files have extra padding after their blocks, so ok, we'll scan
2808 if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2809 m = stbi__get_marker(z);
2812 z->progressive = stbi__SOF_progressive(m);
2813 if (!stbi__process_frame_header(z, scan)) return 0;
2814 return 1;
2817 // decode image to YCbCr format
2818 static int stbi__decode_jpeg_image(stbi__jpeg *j)
2820 int m;
2821 for (m = 0; m < 4; m++) {
2822 j->img_comp[m].raw_data = NULL;
2823 j->img_comp[m].raw_coeff = NULL;
2825 j->restart_interval = 0;
2826 if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2827 m = stbi__get_marker(j);
2828 while (!stbi__EOI(m)) {
2829 if (stbi__SOS(m)) {
2830 if (!stbi__process_scan_header(j)) return 0;
2831 if (!stbi__parse_entropy_coded_data(j)) return 0;
2832 if (j->marker == STBI__MARKER_none ) {
2833 // handle 0s at the end of image data from IP Kamera 9060
2834 while (!stbi__at_eof(j->s)) {
2835 int x = stbi__get8(j->s);
2836 if (x == 255) {
2837 j->marker = stbi__get8(j->s);
2838 break;
2839 } else if (x != 0) {
2840 return stbi__err("junk before marker", "Corrupt JPEG");
2843 // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2845 } else {
2846 if (!stbi__process_marker(j, m)) return 0;
2848 m = stbi__get_marker(j);
2850 if (j->progressive)
2851 stbi__jpeg_finish(j);
2852 return 1;
2855 // static jfif-centered resampling (across block boundaries)
2857 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2858 int w, int hs);
2860 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2862 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2864 STBI_NOTUSED(out);
2865 STBI_NOTUSED(in_far);
2866 STBI_NOTUSED(w);
2867 STBI_NOTUSED(hs);
2868 return in_near;
2871 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2873 // need to generate two samples vertically for every one in input
2874 int i;
2875 STBI_NOTUSED(hs);
2876 for (i=0; i < w; ++i)
2877 out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2878 return out;
2881 static stbi_uc* stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2883 // need to generate two samples horizontally for every one in input
2884 int i;
2885 stbi_uc *input = in_near;
2887 if (w == 1) {
2888 // if only one sample, can't do any interpolation
2889 out[0] = out[1] = input[0];
2890 return out;
2893 out[0] = input[0];
2894 out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2895 for (i=1; i < w-1; ++i) {
2896 int n = 3*input[i]+2;
2897 out[i*2+0] = stbi__div4(n+input[i-1]);
2898 out[i*2+1] = stbi__div4(n+input[i+1]);
2900 out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2901 out[i*2+1] = input[w-1];
2903 STBI_NOTUSED(in_far);
2904 STBI_NOTUSED(hs);
2906 return out;
2909 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2911 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2913 // need to generate 2x2 samples for every one in input
2914 int i,t0,t1;
2915 if (w == 1) {
2916 out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2917 return out;
2920 t1 = 3*in_near[0] + in_far[0];
2921 out[0] = stbi__div4(t1+2);
2922 for (i=1; i < w; ++i) {
2923 t0 = t1;
2924 t1 = 3*in_near[i]+in_far[i];
2925 out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2926 out[i*2 ] = stbi__div16(3*t1 + t0 + 8);
2928 out[w*2-1] = stbi__div4(t1+2);
2930 STBI_NOTUSED(hs);
2932 return out;
2935 #if defined(STBI_SSE2) || defined(STBI_NEON)
2936 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2938 // need to generate 2x2 samples for every one in input
2939 int i=0,t0,t1;
2941 if (w == 1) {
2942 out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2943 return out;
2946 t1 = 3*in_near[0] + in_far[0];
2947 // process groups of 8 pixels for as long as we can.
2948 // note we can't handle the last pixel in a row in this loop
2949 // because we need to handle the filter boundary conditions.
2950 for (; i < ((w-1) & ~7); i += 8) {
2951 #if defined(STBI_SSE2)
2952 // load and perform the vertical filtering pass
2953 // this uses 3*x + y = 4*x + (y - x)
2954 __m128i zero = _mm_setzero_si128();
2955 __m128i farb = _mm_loadl_epi64((__m128i *) (in_far + i));
2956 __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2957 __m128i farw = _mm_unpacklo_epi8(farb, zero);
2958 __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
2959 __m128i diff = _mm_sub_epi16(farw, nearw);
2960 __m128i nears = _mm_slli_epi16(nearw, 2);
2961 __m128i curr = _mm_add_epi16(nears, diff); // current row
2963 // horizontal filter works the same based on shifted vers of current
2964 // row. "prev" is current row shifted right by 1 pixel; we need to
2965 // insert the previous pixel value (from t1).
2966 // "next" is current row shifted left by 1 pixel, with first pixel
2967 // of next block of 8 pixels added in.
2968 __m128i prv0 = _mm_slli_si128(curr, 2);
2969 __m128i nxt0 = _mm_srli_si128(curr, 2);
2970 __m128i prev = _mm_insert_epi16(prv0, t1, 0);
2971 __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
2973 // horizontal filter, polyphase implementation since it's convenient:
2974 // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2975 // odd pixels = 3*cur + next = cur*4 + (next - cur)
2976 // note the shared term.
2977 __m128i bias = _mm_set1_epi16(8);
2978 __m128i curs = _mm_slli_epi16(curr, 2);
2979 __m128i prvd = _mm_sub_epi16(prev, curr);
2980 __m128i nxtd = _mm_sub_epi16(next, curr);
2981 __m128i curb = _mm_add_epi16(curs, bias);
2982 __m128i even = _mm_add_epi16(prvd, curb);
2983 __m128i odd = _mm_add_epi16(nxtd, curb);
2985 // interleave even and odd pixels, then undo scaling.
2986 __m128i int0 = _mm_unpacklo_epi16(even, odd);
2987 __m128i int1 = _mm_unpackhi_epi16(even, odd);
2988 __m128i de0 = _mm_srli_epi16(int0, 4);
2989 __m128i de1 = _mm_srli_epi16(int1, 4);
2991 // pack and write output
2992 __m128i outv = _mm_packus_epi16(de0, de1);
2993 _mm_storeu_si128((__m128i *) (out + i*2), outv);
2994 #elif defined(STBI_NEON)
2995 // load and perform the vertical filtering pass
2996 // this uses 3*x + y = 4*x + (y - x)
2997 uint8x8_t farb = vld1_u8(in_far + i);
2998 uint8x8_t nearb = vld1_u8(in_near + i);
2999 int16x8_t diff = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3000 int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3001 int16x8_t curr = vaddq_s16(nears, diff); // current row
3003 // horizontal filter works the same based on shifted vers of current
3004 // row. "prev" is current row shifted right by 1 pixel; we need to
3005 // insert the previous pixel value (from t1).
3006 // "next" is current row shifted left by 1 pixel, with first pixel
3007 // of next block of 8 pixels added in.
3008 int16x8_t prv0 = vextq_s16(curr, curr, 7);
3009 int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3010 int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3011 int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3013 // horizontal filter, polyphase implementation since it's convenient:
3014 // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3015 // odd pixels = 3*cur + next = cur*4 + (next - cur)
3016 // note the shared term.
3017 int16x8_t curs = vshlq_n_s16(curr, 2);
3018 int16x8_t prvd = vsubq_s16(prev, curr);
3019 int16x8_t nxtd = vsubq_s16(next, curr);
3020 int16x8_t even = vaddq_s16(curs, prvd);
3021 int16x8_t odd = vaddq_s16(curs, nxtd);
3023 // undo scaling and round, then store with even/odd phases interleaved
3024 uint8x8x2_t o;
3025 o.val[0] = vqrshrun_n_s16(even, 4);
3026 o.val[1] = vqrshrun_n_s16(odd, 4);
3027 vst2_u8(out + i*2, o);
3028 #endif
3030 // "previous" value for next iter
3031 t1 = 3*in_near[i+7] + in_far[i+7];
3034 t0 = t1;
3035 t1 = 3*in_near[i] + in_far[i];
3036 out[i*2] = stbi__div16(3*t1 + t0 + 8);
3038 for (++i; i < w; ++i) {
3039 t0 = t1;
3040 t1 = 3*in_near[i]+in_far[i];
3041 out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3042 out[i*2 ] = stbi__div16(3*t1 + t0 + 8);
3044 out[w*2-1] = stbi__div4(t1+2);
3046 STBI_NOTUSED(hs);
3048 return out;
3050 #endif
3052 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3054 // resample with nearest-neighbor
3055 int i,j;
3056 STBI_NOTUSED(in_far);
3057 for (i=0; i < w; ++i)
3058 for (j=0; j < hs; ++j)
3059 out[i*hs+j] = in_near[i];
3060 return out;
3063 #ifdef STBI_JPEG_OLD
3064 // this is the same YCbCr-to-RGB calculation that stb_image has used
3065 // historically before the algorithm changes in 1.49
3066 #define float2fixed(x) ((int) ((x) * 65536 + 0.5))
3067 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3069 int i;
3070 for (i=0; i < count; ++i) {
3071 int y_fixed = (y[i] << 16) + 32768; // rounding
3072 int r,g,b;
3073 int cr = pcr[i] - 128;
3074 int cb = pcb[i] - 128;
3075 r = y_fixed + cr*float2fixed(1.40200f);
3076 g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3077 b = y_fixed + cb*float2fixed(1.77200f);
3078 r >>= 16;
3079 g >>= 16;
3080 b >>= 16;
3081 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3082 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3083 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3084 out[0] = (stbi_uc)r;
3085 out[1] = (stbi_uc)g;
3086 out[2] = (stbi_uc)b;
3087 out[3] = 255;
3088 out += step;
3091 #else
3092 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3093 // to make sure the code produces the same results in both SIMD and scalar
3094 #define float2fixed(x) (((int) ((x) * 4096.0f + 0.5f)) << 8)
3095 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3097 int i;
3098 for (i=0; i < count; ++i) {
3099 int y_fixed = (y[i] << 20) + (1<<19); // rounding
3100 int r,g,b;
3101 int cr = pcr[i] - 128;
3102 int cb = pcb[i] - 128;
3103 r = y_fixed + cr* float2fixed(1.40200f);
3104 g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3105 b = y_fixed + cb* float2fixed(1.77200f);
3106 r >>= 20;
3107 g >>= 20;
3108 b >>= 20;
3109 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3110 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3111 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3112 out[0] = (stbi_uc)r;
3113 out[1] = (stbi_uc)g;
3114 out[2] = (stbi_uc)b;
3115 out[3] = 255;
3116 out += step;
3119 #endif
3121 #if defined(STBI_SSE2) || defined(STBI_NEON)
3122 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3124 int i = 0;
3126 #ifdef STBI_SSE2
3127 // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3128 // it's useful in practice (you wouldn't use it for textures, for example).
3129 // so just accelerate step == 4 case.
3130 if (step == 4) {
3131 // this is a fairly straightforward implementation and not super-optimized.
3132 __m128i signflip = _mm_set1_epi8(-0x80);
3133 __m128i cr_const0 = _mm_set1_epi16( (short) ( 1.40200f*4096.0f+0.5f));
3134 __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3135 __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3136 __m128i cb_const1 = _mm_set1_epi16( (short) ( 1.77200f*4096.0f+0.5f));
3137 __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3138 __m128i xw = _mm_set1_epi16(255); // alpha channel
3140 for (; i+7 < count; i += 8) {
3141 // load
3142 __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3143 __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3144 __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3145 __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3146 __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3148 // unpack to short (and left-shift cr, cb by 8)
3149 __m128i yw = _mm_unpacklo_epi8(y_bias, y_bytes);
3150 __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3151 __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3153 // color transform
3154 __m128i yws = _mm_srli_epi16(yw, 4);
3155 __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3156 __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3157 __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3158 __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3159 __m128i rws = _mm_add_epi16(cr0, yws);
3160 __m128i gwt = _mm_add_epi16(cb0, yws);
3161 __m128i bws = _mm_add_epi16(yws, cb1);
3162 __m128i gws = _mm_add_epi16(gwt, cr1);
3164 // descale
3165 __m128i rw = _mm_srai_epi16(rws, 4);
3166 __m128i bw = _mm_srai_epi16(bws, 4);
3167 __m128i gw = _mm_srai_epi16(gws, 4);
3169 // back to byte, set up for transpose
3170 __m128i brb = _mm_packus_epi16(rw, bw);
3171 __m128i gxb = _mm_packus_epi16(gw, xw);
3173 // transpose to interleave channels
3174 __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3175 __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3176 __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3177 __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3179 // store
3180 _mm_storeu_si128((__m128i *) (out + 0), o0);
3181 _mm_storeu_si128((__m128i *) (out + 16), o1);
3182 out += 32;
3185 #endif
3187 #ifdef STBI_NEON
3188 // in this version, step=3 support would be easy to add. but is there demand?
3189 if (step == 4) {
3190 // this is a fairly straightforward implementation and not super-optimized.
3191 uint8x8_t signflip = vdup_n_u8(0x80);
3192 int16x8_t cr_const0 = vdupq_n_s16( (short) ( 1.40200f*4096.0f+0.5f));
3193 int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3194 int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3195 int16x8_t cb_const1 = vdupq_n_s16( (short) ( 1.77200f*4096.0f+0.5f));
3197 for (; i+7 < count; i += 8) {
3198 // load
3199 uint8x8_t y_bytes = vld1_u8(y + i);
3200 uint8x8_t cr_bytes = vld1_u8(pcr + i);
3201 uint8x8_t cb_bytes = vld1_u8(pcb + i);
3202 int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3203 int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3205 // expand to s16
3206 int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3207 int16x8_t crw = vshll_n_s8(cr_biased, 7);
3208 int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3210 // color transform
3211 int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3212 int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3213 int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3214 int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3215 int16x8_t rws = vaddq_s16(yws, cr0);
3216 int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3217 int16x8_t bws = vaddq_s16(yws, cb1);
3219 // undo scaling, round, convert to byte
3220 uint8x8x4_t o;
3221 o.val[0] = vqrshrun_n_s16(rws, 4);
3222 o.val[1] = vqrshrun_n_s16(gws, 4);
3223 o.val[2] = vqrshrun_n_s16(bws, 4);
3224 o.val[3] = vdup_n_u8(255);
3226 // store, interleaving r/g/b/a
3227 vst4_u8(out, o);
3228 out += 8*4;
3231 #endif
3233 for (; i < count; ++i) {
3234 int y_fixed = (y[i] << 20) + (1<<19); // rounding
3235 int r,g,b;
3236 int cr = pcr[i] - 128;
3237 int cb = pcb[i] - 128;
3238 r = y_fixed + cr* float2fixed(1.40200f);
3239 g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3240 b = y_fixed + cb* float2fixed(1.77200f);
3241 r >>= 20;
3242 g >>= 20;
3243 b >>= 20;
3244 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3245 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3246 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3247 out[0] = (stbi_uc)r;
3248 out[1] = (stbi_uc)g;
3249 out[2] = (stbi_uc)b;
3250 out[3] = 255;
3251 out += step;
3254 #endif
3256 // set up the kernels
3257 static void stbi__setup_jpeg(stbi__jpeg *j)
3259 j->idct_block_kernel = stbi__idct_block;
3260 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3261 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3263 #ifdef STBI_SSE2
3264 if (stbi__sse2_available()) {
3265 j->idct_block_kernel = stbi__idct_simd;
3266 #ifndef STBI_JPEG_OLD
3267 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3268 #endif
3269 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3271 #endif
3273 #ifdef STBI_NEON
3274 j->idct_block_kernel = stbi__idct_simd;
3275 #ifndef STBI_JPEG_OLD
3276 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3277 #endif
3278 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3279 #endif
3282 // clean up the temporary component buffers
3283 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3285 int i;
3286 for (i=0; i < j->s->img_n; ++i) {
3287 if (j->img_comp[i].raw_data) {
3288 STBI_FREE(j->img_comp[i].raw_data);
3289 j->img_comp[i].raw_data = NULL;
3290 j->img_comp[i].data = NULL;
3292 if (j->img_comp[i].raw_coeff) {
3293 STBI_FREE(j->img_comp[i].raw_coeff);
3294 j->img_comp[i].raw_coeff = 0;
3295 j->img_comp[i].coeff = 0;
3297 if (j->img_comp[i].linebuf) {
3298 STBI_FREE(j->img_comp[i].linebuf);
3299 j->img_comp[i].linebuf = NULL;
3304 typedef struct
3306 resample_row_func resample;
3307 stbi_uc *line0,*line1;
3308 int hs,vs; // expansion factor in each axis
3309 int w_lores; // horizontal pixels pre-expansion
3310 int ystep; // how far through vertical expansion we are
3311 int ypos; // which pre-expansion row we're on
3312 } stbi__resample;
3314 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3316 int n, decode_n;
3317 z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3319 // validate req_comp
3320 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3322 // load a jpeg image from whichever source, but leave in YCbCr format
3323 if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3325 // determine actual number of components to generate
3326 n = req_comp ? req_comp : z->s->img_n;
3328 if (z->s->img_n == 3 && n < 3)
3329 decode_n = 1;
3330 else
3331 decode_n = z->s->img_n;
3333 // resample and color-convert
3335 int k;
3336 unsigned int i,j;
3337 stbi_uc *output;
3338 stbi_uc *coutput[4];
3340 stbi__resample res_comp[4];
3342 for (k=0; k < decode_n; ++k) {
3343 stbi__resample *r = &res_comp[k];
3345 // allocate line buffer big enough for upsampling off the edges
3346 // with upsample factor of 4
3347 z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3348 if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3350 r->hs = z->img_h_max / z->img_comp[k].h;
3351 r->vs = z->img_v_max / z->img_comp[k].v;
3352 r->ystep = r->vs >> 1;
3353 r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3354 r->ypos = 0;
3355 r->line0 = r->line1 = z->img_comp[k].data;
3357 if (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3358 else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3359 else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3360 else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3361 else r->resample = stbi__resample_row_generic;
3364 // can't error after this so, this is safe
3365 output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3366 if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3368 // now go ahead and resample
3369 for (j=0; j < z->s->img_y; ++j) {
3370 stbi_uc *out = output + n * z->s->img_x * j;
3371 for (k=0; k < decode_n; ++k) {
3372 stbi__resample *r = &res_comp[k];
3373 int y_bot = r->ystep >= (r->vs >> 1);
3374 coutput[k] = r->resample(z->img_comp[k].linebuf,
3375 y_bot ? r->line1 : r->line0,
3376 y_bot ? r->line0 : r->line1,
3377 r->w_lores, r->hs);
3378 if (++r->ystep >= r->vs) {
3379 r->ystep = 0;
3380 r->line0 = r->line1;
3381 if (++r->ypos < z->img_comp[k].y)
3382 r->line1 += z->img_comp[k].w2;
3385 if (n >= 3) {
3386 stbi_uc *y = coutput[0];
3387 if (z->s->img_n == 3) {
3388 z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3389 } else
3390 for (i=0; i < z->s->img_x; ++i) {
3391 out[0] = out[1] = out[2] = y[i];
3392 out[3] = 255; // not used if n==3
3393 out += n;
3395 } else {
3396 stbi_uc *y = coutput[0];
3397 if (n == 1)
3398 for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3399 else
3400 for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3403 stbi__cleanup_jpeg(z);
3404 *out_x = z->s->img_x;
3405 *out_y = z->s->img_y;
3406 if (comp) *comp = z->s->img_n; // report original components, not output
3407 return output;
3411 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3413 stbi__jpeg j;
3414 j.s = s;
3415 stbi__setup_jpeg(&j);
3416 return load_jpeg_image(&j, x,y,comp,req_comp);
3419 static int stbi__jpeg_test(stbi__context *s)
3421 int r;
3422 stbi__jpeg j;
3423 j.s = s;
3424 stbi__setup_jpeg(&j);
3425 r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3426 stbi__rewind(s);
3427 return r;
3430 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3432 if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3433 stbi__rewind( j->s );
3434 return 0;
3436 if (x) *x = j->s->img_x;
3437 if (y) *y = j->s->img_y;
3438 if (comp) *comp = j->s->img_n;
3439 return 1;
3442 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3444 stbi__jpeg j;
3445 j.s = s;
3446 return stbi__jpeg_info_raw(&j, x, y, comp);
3448 #endif
3450 // public domain zlib decode v0.2 Sean Barrett 2006-11-18
3451 // simple implementation
3452 // - all input must be provided in an upfront buffer
3453 // - all output is written to a single output buffer (can malloc/realloc)
3454 // performance
3455 // - fast huffman
3457 #ifndef STBI_NO_ZLIB
3459 // fast-way is faster to check than jpeg huffman, but slow way is slower
3460 #define STBI__ZFAST_BITS 9 // accelerate all cases in default tables
3461 #define STBI__ZFAST_MASK ((1 << STBI__ZFAST_BITS) - 1)
3463 // zlib-style huffman encoding
3464 // (jpegs packs from left, zlib from right, so can't share code)
3465 typedef struct
3467 stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3468 stbi__uint16 firstcode[16];
3469 int maxcode[17];
3470 stbi__uint16 firstsymbol[16];
3471 stbi_uc size[288];
3472 stbi__uint16 value[288];
3473 } stbi__zhuffman;
3475 stbi_inline static int stbi__bitreverse16(int n)
3477 n = ((n & 0xAAAA) >> 1) | ((n & 0x5555) << 1);
3478 n = ((n & 0xCCCC) >> 2) | ((n & 0x3333) << 2);
3479 n = ((n & 0xF0F0) >> 4) | ((n & 0x0F0F) << 4);
3480 n = ((n & 0xFF00) >> 8) | ((n & 0x00FF) << 8);
3481 return n;
3484 stbi_inline static int stbi__bit_reverse(int v, int bits)
3486 STBI_ASSERT(bits <= 16);
3487 // to bit reverse n bits, reverse 16 and shift
3488 // e.g. 11 bits, bit reverse and shift away 5
3489 return stbi__bitreverse16(v) >> (16-bits);
3492 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3494 int i,k=0;
3495 int code, next_code[16], sizes[17];
3497 // DEFLATE spec for generating codes
3498 memset(sizes, 0, sizeof(sizes));
3499 memset(z->fast, 0, sizeof(z->fast));
3500 for (i=0; i < num; ++i)
3501 ++sizes[sizelist[i]];
3502 sizes[0] = 0;
3503 for (i=1; i < 16; ++i)
3504 if (sizes[i] > (1 << i))
3505 return stbi__err("bad sizes", "Corrupt PNG");
3506 code = 0;
3507 for (i=1; i < 16; ++i) {
3508 next_code[i] = code;
3509 z->firstcode[i] = (stbi__uint16) code;
3510 z->firstsymbol[i] = (stbi__uint16) k;
3511 code = (code + sizes[i]);
3512 if (sizes[i])
3513 if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
3514 z->maxcode[i] = code << (16-i); // preshift for inner loop
3515 code <<= 1;
3516 k += sizes[i];
3518 z->maxcode[16] = 0x10000; // sentinel
3519 for (i=0; i < num; ++i) {
3520 int s = sizelist[i];
3521 if (s) {
3522 int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3523 stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3524 z->size [c] = (stbi_uc ) s;
3525 z->value[c] = (stbi__uint16) i;
3526 if (s <= STBI__ZFAST_BITS) {
3527 int j = stbi__bit_reverse(next_code[s],s);
3528 while (j < (1 << STBI__ZFAST_BITS)) {
3529 z->fast[j] = fastv;
3530 j += (1 << s);
3533 ++next_code[s];
3536 return 1;
3539 // zlib-from-memory implementation for PNG reading
3540 // because PNG allows splitting the zlib stream arbitrarily,
3541 // and it's annoying structurally to have PNG call ZLIB call PNG,
3542 // we require PNG read all the IDATs and combine them into a single
3543 // memory buffer
3545 typedef struct
3547 stbi_uc *zbuffer, *zbuffer_end;
3548 int num_bits;
3549 stbi__uint32 code_buffer;
3551 char *zout;
3552 char *zout_start;
3553 char *zout_end;
3554 int z_expandable;
3556 stbi__zhuffman z_length, z_distance;
3557 } stbi__zbuf;
3559 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3561 if (z->zbuffer >= z->zbuffer_end) return 0;
3562 return *z->zbuffer++;
3565 static void stbi__fill_bits(stbi__zbuf *z)
3567 do {
3568 STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3569 z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
3570 z->num_bits += 8;
3571 } while (z->num_bits <= 24);
3574 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3576 unsigned int k;
3577 if (z->num_bits < n) stbi__fill_bits(z);
3578 k = z->code_buffer & ((1 << n) - 1);
3579 z->code_buffer >>= n;
3580 z->num_bits -= n;
3581 return k;
3584 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3586 int b,s,k;
3587 // not resolved by fast table, so compute it the slow way
3588 // use jpeg approach, which requires MSbits at top
3589 k = stbi__bit_reverse(a->code_buffer, 16);
3590 for (s=STBI__ZFAST_BITS+1; ; ++s)
3591 if (k < z->maxcode[s])
3592 break;
3593 if (s == 16) return -1; // invalid code!
3594 // code size is s, so:
3595 b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3596 STBI_ASSERT(z->size[b] == s);
3597 a->code_buffer >>= s;
3598 a->num_bits -= s;
3599 return z->value[b];
3602 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3604 int b,s;
3605 if (a->num_bits < 16) stbi__fill_bits(a);
3606 b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3607 if (b) {
3608 s = b >> 9;
3609 a->code_buffer >>= s;
3610 a->num_bits -= s;
3611 return b & 511;
3613 return stbi__zhuffman_decode_slowpath(a, z);
3616 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes
3618 char *q;
3619 int cur, limit;
3620 z->zout = zout;
3621 if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3622 cur = (int) (z->zout - z->zout_start);
3623 limit = (int) (z->zout_end - z->zout_start);
3624 while (cur + n > limit)
3625 limit *= 2;
3626 q = (char *) STBI_REALLOC(z->zout_start, limit);
3627 if (q == NULL) return stbi__err("outofmem", "Out of memory");
3628 z->zout_start = q;
3629 z->zout = q + cur;
3630 z->zout_end = q + limit;
3631 return 1;
3634 static int stbi__zlength_base[31] = {
3635 3,4,5,6,7,8,9,10,11,13,
3636 15,17,19,23,27,31,35,43,51,59,
3637 67,83,99,115,131,163,195,227,258,0,0 };
3639 static int stbi__zlength_extra[31]=
3640 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3642 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3643 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3645 static int stbi__zdist_extra[32] =
3646 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3648 static int stbi__parse_huffman_block(stbi__zbuf *a)
3650 char *zout = a->zout;
3651 for(;;) {
3652 int z = stbi__zhuffman_decode(a, &a->z_length);
3653 if (z < 256) {
3654 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3655 if (zout >= a->zout_end) {
3656 if (!stbi__zexpand(a, zout, 1)) return 0;
3657 zout = a->zout;
3659 *zout++ = (char) z;
3660 } else {
3661 stbi_uc *p;
3662 int len,dist;
3663 if (z == 256) {
3664 a->zout = zout;
3665 return 1;
3667 z -= 257;
3668 len = stbi__zlength_base[z];
3669 if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3670 z = stbi__zhuffman_decode(a, &a->z_distance);
3671 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3672 dist = stbi__zdist_base[z];
3673 if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3674 if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3675 if (zout + len > a->zout_end) {
3676 if (!stbi__zexpand(a, zout, len)) return 0;
3677 zout = a->zout;
3679 p = (stbi_uc *) (zout - dist);
3680 if (dist == 1) { // run of one byte; common in images.
3681 stbi_uc v = *p;
3682 if (len) { do *zout++ = v; while (--len); }
3683 } else {
3684 if (len) { do *zout++ = *p++; while (--len); }
3690 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3692 static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3693 stbi__zhuffman z_codelength;
3694 stbi_uc lencodes[286+32+137];//padding for maximum single op
3695 stbi_uc codelength_sizes[19];
3696 int i,n;
3698 int hlit = stbi__zreceive(a,5) + 257;
3699 int hdist = stbi__zreceive(a,5) + 1;
3700 int hclen = stbi__zreceive(a,4) + 4;
3702 memset(codelength_sizes, 0, sizeof(codelength_sizes));
3703 for (i=0; i < hclen; ++i) {
3704 int s = stbi__zreceive(a,3);
3705 codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3707 if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3709 n = 0;
3710 while (n < hlit + hdist) {
3711 int c = stbi__zhuffman_decode(a, &z_codelength);
3712 if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
3713 if (c < 16)
3714 lencodes[n++] = (stbi_uc) c;
3715 else if (c == 16) {
3716 c = stbi__zreceive(a,2)+3;
3717 memset(lencodes+n, lencodes[n-1], c);
3718 n += c;
3719 } else if (c == 17) {
3720 c = stbi__zreceive(a,3)+3;
3721 memset(lencodes+n, 0, c);
3722 n += c;
3723 } else {
3724 STBI_ASSERT(c == 18);
3725 c = stbi__zreceive(a,7)+11;
3726 memset(lencodes+n, 0, c);
3727 n += c;
3730 if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3731 if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3732 if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3733 return 1;
3736 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
3738 stbi_uc header[4];
3739 int len,nlen,k;
3740 if (a->num_bits & 7)
3741 stbi__zreceive(a, a->num_bits & 7); // discard
3742 // drain the bit-packed data into header
3743 k = 0;
3744 while (a->num_bits > 0) {
3745 header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3746 a->code_buffer >>= 8;
3747 a->num_bits -= 8;
3749 STBI_ASSERT(a->num_bits == 0);
3750 // now fill header the normal way
3751 while (k < 4)
3752 header[k++] = stbi__zget8(a);
3753 len = header[1] * 256 + header[0];
3754 nlen = header[3] * 256 + header[2];
3755 if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3756 if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3757 if (a->zout + len > a->zout_end)
3758 if (!stbi__zexpand(a, a->zout, len)) return 0;
3759 memcpy(a->zout, a->zbuffer, len);
3760 a->zbuffer += len;
3761 a->zout += len;
3762 return 1;
3765 static int stbi__parse_zlib_header(stbi__zbuf *a)
3767 int cmf = stbi__zget8(a);
3768 int cm = cmf & 15;
3769 /* int cinfo = cmf >> 4; */
3770 int flg = stbi__zget8(a);
3771 if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3772 if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3773 if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3774 // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3775 return 1;
3778 // @TODO: should statically initialize these for optimal thread safety
3779 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
3780 static void stbi__init_zdefaults(void)
3782 int i; // use <= to match clearly with spec
3783 for (i=0; i <= 143; ++i) stbi__zdefault_length[i] = 8;
3784 for ( ; i <= 255; ++i) stbi__zdefault_length[i] = 9;
3785 for ( ; i <= 279; ++i) stbi__zdefault_length[i] = 7;
3786 for ( ; i <= 287; ++i) stbi__zdefault_length[i] = 8;
3788 for (i=0; i <= 31; ++i) stbi__zdefault_distance[i] = 5;
3791 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3793 int final, type;
3794 if (parse_header)
3795 if (!stbi__parse_zlib_header(a)) return 0;
3796 a->num_bits = 0;
3797 a->code_buffer = 0;
3798 do {
3799 final = stbi__zreceive(a,1);
3800 type = stbi__zreceive(a,2);
3801 if (type == 0) {
3802 if (!stbi__parse_uncomperssed_block(a)) return 0;
3803 } else if (type == 3) {
3804 return 0;
3805 } else {
3806 if (type == 1) {
3807 // use fixed code lengths
3808 if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3809 if (!stbi__zbuild_huffman(&a->z_length , stbi__zdefault_length , 288)) return 0;
3810 if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance, 32)) return 0;
3811 } else {
3812 if (!stbi__compute_huffman_codes(a)) return 0;
3814 if (!stbi__parse_huffman_block(a)) return 0;
3816 } while (!final);
3817 return 1;
3820 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3822 a->zout_start = obuf;
3823 a->zout = obuf;
3824 a->zout_end = obuf + olen;
3825 a->z_expandable = exp;
3827 return stbi__parse_zlib(a, parse_header);
3830 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3832 stbi__zbuf a;
3833 char *p = (char *) stbi__malloc(initial_size);
3834 if (p == NULL) return NULL;
3835 a.zbuffer = (stbi_uc *) buffer;
3836 a.zbuffer_end = (stbi_uc *) buffer + len;
3837 if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3838 if (outlen) *outlen = (int) (a.zout - a.zout_start);
3839 return a.zout_start;
3840 } else {
3841 STBI_FREE(a.zout_start);
3842 return NULL;
3846 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3848 return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3851 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3853 stbi__zbuf a;
3854 char *p = (char *) stbi__malloc(initial_size);
3855 if (p == NULL) return NULL;
3856 a.zbuffer = (stbi_uc *) buffer;
3857 a.zbuffer_end = (stbi_uc *) buffer + len;
3858 if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3859 if (outlen) *outlen = (int) (a.zout - a.zout_start);
3860 return a.zout_start;
3861 } else {
3862 STBI_FREE(a.zout_start);
3863 return NULL;
3867 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3869 stbi__zbuf a;
3870 a.zbuffer = (stbi_uc *) ibuffer;
3871 a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3872 if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3873 return (int) (a.zout - a.zout_start);
3874 else
3875 return -1;
3878 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3880 stbi__zbuf a;
3881 char *p = (char *) stbi__malloc(16384);
3882 if (p == NULL) return NULL;
3883 a.zbuffer = (stbi_uc *) buffer;
3884 a.zbuffer_end = (stbi_uc *) buffer+len;
3885 if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3886 if (outlen) *outlen = (int) (a.zout - a.zout_start);
3887 return a.zout_start;
3888 } else {
3889 STBI_FREE(a.zout_start);
3890 return NULL;
3894 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3896 stbi__zbuf a;
3897 a.zbuffer = (stbi_uc *) ibuffer;
3898 a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3899 if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3900 return (int) (a.zout - a.zout_start);
3901 else
3902 return -1;
3904 #endif
3906 // public domain "baseline" PNG decoder v0.10 Sean Barrett 2006-11-18
3907 // simple implementation
3908 // - only 8-bit samples
3909 // - no CRC checking
3910 // - allocates lots of intermediate memory
3911 // - avoids problem of streaming data between subsystems
3912 // - avoids explicit window management
3913 // performance
3914 // - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3916 #ifndef STBI_NO_PNG
3917 typedef struct
3919 stbi__uint32 length;
3920 stbi__uint32 type;
3921 } stbi__pngchunk;
3923 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3925 stbi__pngchunk c;
3926 c.length = stbi__get32be(s);
3927 c.type = stbi__get32be(s);
3928 return c;
3931 static int stbi__check_png_header(stbi__context *s)
3933 static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3934 int i;
3935 for (i=0; i < 8; ++i)
3936 if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3937 return 1;
3940 typedef struct
3942 stbi__context *s;
3943 stbi_uc *idata, *expanded, *out;
3944 } stbi__png;
3947 enum {
3948 STBI__F_none=0,
3949 STBI__F_sub=1,
3950 STBI__F_up=2,
3951 STBI__F_avg=3,
3952 STBI__F_paeth=4,
3953 // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3954 STBI__F_avg_first,
3955 STBI__F_paeth_first
3958 static stbi_uc first_row_filter[5] =
3960 STBI__F_none,
3961 STBI__F_sub,
3962 STBI__F_none,
3963 STBI__F_avg_first,
3964 STBI__F_paeth_first
3967 static int stbi__paeth(int a, int b, int c)
3969 int p = a + b - c;
3970 int pa = abs(p-a);
3971 int pb = abs(p-b);
3972 int pc = abs(p-c);
3973 if (pa <= pb && pa <= pc) return a;
3974 if (pb <= pc) return b;
3975 return c;
3978 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
3980 // create the png data from post-deflated data
3981 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
3983 stbi__context *s = a->s;
3984 stbi__uint32 i,j,stride = x*out_n;
3985 stbi__uint32 img_len, img_width_bytes;
3986 int k;
3987 int img_n = s->img_n; // copy it into a local for later
3989 STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
3990 a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
3991 if (!a->out) return stbi__err("outofmem", "Out of memory");
3993 img_width_bytes = (((img_n * x * depth) + 7) >> 3);
3994 img_len = (img_width_bytes + 1) * y;
3995 if (s->img_x == x && s->img_y == y) {
3996 if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
3997 } else { // interlaced:
3998 if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4001 for (j=0; j < y; ++j) {
4002 stbi_uc *cur = a->out + stride*j;
4003 stbi_uc *prior = cur - stride;
4004 int filter = *raw++;
4005 int filter_bytes = img_n;
4006 int width = x;
4007 if (filter > 4)
4008 return stbi__err("invalid filter","Corrupt PNG");
4010 if (depth < 8) {
4011 STBI_ASSERT(img_width_bytes <= x);
4012 cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4013 filter_bytes = 1;
4014 width = img_width_bytes;
4017 // if first row, use special filter that doesn't sample previous row
4018 if (j == 0) filter = first_row_filter[filter];
4020 // handle first byte explicitly
4021 for (k=0; k < filter_bytes; ++k) {
4022 switch (filter) {
4023 case STBI__F_none : cur[k] = raw[k]; break;
4024 case STBI__F_sub : cur[k] = raw[k]; break;
4025 case STBI__F_up : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4026 case STBI__F_avg : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4027 case STBI__F_paeth : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4028 case STBI__F_avg_first : cur[k] = raw[k]; break;
4029 case STBI__F_paeth_first: cur[k] = raw[k]; break;
4033 if (depth == 8) {
4034 if (img_n != out_n)
4035 cur[img_n] = 255; // first pixel
4036 raw += img_n;
4037 cur += out_n;
4038 prior += out_n;
4039 } else {
4040 raw += 1;
4041 cur += 1;
4042 prior += 1;
4045 // this is a little gross, so that we don't switch per-pixel or per-component
4046 if (depth < 8 || img_n == out_n) {
4047 int nk = (width - 1)*img_n;
4048 #define CASE(f) \
4049 case f: \
4050 for (k=0; k < nk; ++k)
4051 switch (filter) {
4052 // "none" filter turns into a memcpy here; make that explicit.
4053 case STBI__F_none: memcpy(cur, raw, nk); break;
4054 CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
4055 CASE(STBI__F_up) cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4056 CASE(STBI__F_avg) cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
4057 CASE(STBI__F_paeth) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
4058 CASE(STBI__F_avg_first) cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
4059 CASE(STBI__F_paeth_first) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
4061 #undef CASE
4062 raw += nk;
4063 } else {
4064 STBI_ASSERT(img_n+1 == out_n);
4065 #define CASE(f) \
4066 case f: \
4067 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
4068 for (k=0; k < img_n; ++k)
4069 switch (filter) {
4070 CASE(STBI__F_none) cur[k] = raw[k]; break;
4071 CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
4072 CASE(STBI__F_up) cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4073 CASE(STBI__F_avg) cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
4074 CASE(STBI__F_paeth) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
4075 CASE(STBI__F_avg_first) cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
4076 CASE(STBI__F_paeth_first) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
4078 #undef CASE
4082 // we make a separate pass to expand bits to pixels; for performance,
4083 // this could run two scanlines behind the above code, so it won't
4084 // intefere with filtering but will still be in the cache.
4085 if (depth < 8) {
4086 for (j=0; j < y; ++j) {
4087 stbi_uc *cur = a->out + stride*j;
4088 stbi_uc *in = a->out + stride*j + x*out_n - img_width_bytes;
4089 // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4090 // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4091 stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4093 // note that the final byte might overshoot and write more data than desired.
4094 // we can allocate enough data that this never writes out of memory, but it
4095 // could also overwrite the next scanline. can it overwrite non-empty data
4096 // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4097 // so we need to explicitly clamp the final ones
4099 if (depth == 4) {
4100 for (k=x*img_n; k >= 2; k-=2, ++in) {
4101 *cur++ = scale * ((*in >> 4) );
4102 *cur++ = scale * ((*in ) & 0x0f);
4104 if (k > 0) *cur++ = scale * ((*in >> 4) );
4105 } else if (depth == 2) {
4106 for (k=x*img_n; k >= 4; k-=4, ++in) {
4107 *cur++ = scale * ((*in >> 6) );
4108 *cur++ = scale * ((*in >> 4) & 0x03);
4109 *cur++ = scale * ((*in >> 2) & 0x03);
4110 *cur++ = scale * ((*in ) & 0x03);
4112 if (k > 0) *cur++ = scale * ((*in >> 6) );
4113 if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4114 if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4115 } else if (depth == 1) {
4116 for (k=x*img_n; k >= 8; k-=8, ++in) {
4117 *cur++ = scale * ((*in >> 7) );
4118 *cur++ = scale * ((*in >> 6) & 0x01);
4119 *cur++ = scale * ((*in >> 5) & 0x01);
4120 *cur++ = scale * ((*in >> 4) & 0x01);
4121 *cur++ = scale * ((*in >> 3) & 0x01);
4122 *cur++ = scale * ((*in >> 2) & 0x01);
4123 *cur++ = scale * ((*in >> 1) & 0x01);
4124 *cur++ = scale * ((*in ) & 0x01);
4126 if (k > 0) *cur++ = scale * ((*in >> 7) );
4127 if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4128 if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4129 if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4130 if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4131 if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4132 if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4134 if (img_n != out_n) {
4135 int q;
4136 // insert alpha = 255
4137 cur = a->out + stride*j;
4138 if (img_n == 1) {
4139 for (q=x-1; q >= 0; --q) {
4140 cur[q*2+1] = 255;
4141 cur[q*2+0] = cur[q];
4143 } else {
4144 STBI_ASSERT(img_n == 3);
4145 for (q=x-1; q >= 0; --q) {
4146 cur[q*4+3] = 255;
4147 cur[q*4+2] = cur[q*3+2];
4148 cur[q*4+1] = cur[q*3+1];
4149 cur[q*4+0] = cur[q*3+0];
4156 return 1;
4159 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4161 stbi_uc *final;
4162 int p;
4163 if (!interlaced)
4164 return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4166 // de-interlacing
4167 final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4168 for (p=0; p < 7; ++p) {
4169 int xorig[] = { 0,4,0,2,0,1,0 };
4170 int yorig[] = { 0,0,4,0,2,0,1 };
4171 int xspc[] = { 8,8,4,4,2,2,1 };
4172 int yspc[] = { 8,8,8,4,4,2,2 };
4173 int i,j,x,y;
4174 // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4175 x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4176 y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4177 if (x && y) {
4178 stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4179 if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4180 STBI_FREE(final);
4181 return 0;
4183 for (j=0; j < y; ++j) {
4184 for (i=0; i < x; ++i) {
4185 int out_y = j*yspc[p]+yorig[p];
4186 int out_x = i*xspc[p]+xorig[p];
4187 memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4188 a->out + (j*x+i)*out_n, out_n);
4191 STBI_FREE(a->out);
4192 image_data += img_len;
4193 image_data_len -= img_len;
4196 a->out = final;
4198 return 1;
4201 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4203 stbi__context *s = z->s;
4204 stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4205 stbi_uc *p = z->out;
4207 // compute color-based transparency, assuming we've
4208 // already got 255 as the alpha value in the output
4209 STBI_ASSERT(out_n == 2 || out_n == 4);
4211 if (out_n == 2) {
4212 for (i=0; i < pixel_count; ++i) {
4213 p[1] = (p[0] == tc[0] ? 0 : 255);
4214 p += 2;
4216 } else {
4217 for (i=0; i < pixel_count; ++i) {
4218 if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4219 p[3] = 0;
4220 p += 4;
4223 return 1;
4226 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4228 stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4229 stbi_uc *p, *temp_out, *orig = a->out;
4231 p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4232 if (p == NULL) return stbi__err("outofmem", "Out of memory");
4234 // between here and free(out) below, exitting would leak
4235 temp_out = p;
4237 if (pal_img_n == 3) {
4238 for (i=0; i < pixel_count; ++i) {
4239 int n = orig[i]*4;
4240 p[0] = palette[n ];
4241 p[1] = palette[n+1];
4242 p[2] = palette[n+2];
4243 p += 3;
4245 } else {
4246 for (i=0; i < pixel_count; ++i) {
4247 int n = orig[i]*4;
4248 p[0] = palette[n ];
4249 p[1] = palette[n+1];
4250 p[2] = palette[n+2];
4251 p[3] = palette[n+3];
4252 p += 4;
4255 STBI_FREE(a->out);
4256 a->out = temp_out;
4258 STBI_NOTUSED(len);
4260 return 1;
4263 static int stbi__unpremultiply_on_load = 0;
4264 static int stbi__de_iphone_flag = 0;
4266 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4268 stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4271 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4273 stbi__de_iphone_flag = flag_true_if_should_convert;
4276 static void stbi__de_iphone(stbi__png *z)
4278 stbi__context *s = z->s;
4279 stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4280 stbi_uc *p = z->out;
4282 if (s->img_out_n == 3) { // convert bgr to rgb
4283 for (i=0; i < pixel_count; ++i) {
4284 stbi_uc t = p[0];
4285 p[0] = p[2];
4286 p[2] = t;
4287 p += 3;
4289 } else {
4290 STBI_ASSERT(s->img_out_n == 4);
4291 if (stbi__unpremultiply_on_load) {
4292 // convert bgr to rgb and unpremultiply
4293 for (i=0; i < pixel_count; ++i) {
4294 stbi_uc a = p[3];
4295 stbi_uc t = p[0];
4296 if (a) {
4297 p[0] = p[2] * 255 / a;
4298 p[1] = p[1] * 255 / a;
4299 p[2] = t * 255 / a;
4300 } else {
4301 p[0] = p[2];
4302 p[2] = t;
4304 p += 4;
4306 } else {
4307 // convert bgr to rgb
4308 for (i=0; i < pixel_count; ++i) {
4309 stbi_uc t = p[0];
4310 p[0] = p[2];
4311 p[2] = t;
4312 p += 4;
4318 #define STBI__PNG_TYPE(a,b,c,d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4320 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4322 stbi_uc palette[1024], pal_img_n=0;
4323 stbi_uc has_trans=0, tc[3];
4324 stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4325 int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
4326 stbi__context *s = z->s;
4328 z->expanded = NULL;
4329 z->idata = NULL;
4330 z->out = NULL;
4332 if (!stbi__check_png_header(s)) return 0;
4334 if (scan == STBI__SCAN_type) return 1;
4336 for (;;) {
4337 stbi__pngchunk c = stbi__get_chunk_header(s);
4338 switch (c.type) {
4339 case STBI__PNG_TYPE('C','g','B','I'):
4340 is_iphone = 1;
4341 stbi__skip(s, c.length);
4342 break;
4343 case STBI__PNG_TYPE('I','H','D','R'): {
4344 int comp,filter;
4345 if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4346 first = 0;
4347 if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4348 s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4349 s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4350 depth = stbi__get8(s); if (depth != 1 && depth != 2 && depth != 4 && depth != 8) return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
4351 color = stbi__get8(s); if (color > 6) return stbi__err("bad ctype","Corrupt PNG");
4352 if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4353 comp = stbi__get8(s); if (comp) return stbi__err("bad comp method","Corrupt PNG");
4354 filter= stbi__get8(s); if (filter) return stbi__err("bad filter method","Corrupt PNG");
4355 interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4356 if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4357 if (!pal_img_n) {
4358 s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4359 if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4360 if (scan == STBI__SCAN_header) return 1;
4361 } else {
4362 // if paletted, then pal_n is our final components, and
4363 // img_n is # components to decompress/filter.
4364 s->img_n = 1;
4365 if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4366 // if SCAN_header, have to scan to see if we have a tRNS
4368 break;
4371 case STBI__PNG_TYPE('P','L','T','E'): {
4372 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4373 if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4374 pal_len = c.length / 3;
4375 if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4376 for (i=0; i < pal_len; ++i) {
4377 palette[i*4+0] = stbi__get8(s);
4378 palette[i*4+1] = stbi__get8(s);
4379 palette[i*4+2] = stbi__get8(s);
4380 palette[i*4+3] = 255;
4382 break;
4385 case STBI__PNG_TYPE('t','R','N','S'): {
4386 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4387 if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4388 if (pal_img_n) {
4389 if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4390 if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4391 if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4392 pal_img_n = 4;
4393 for (i=0; i < c.length; ++i)
4394 palette[i*4+3] = stbi__get8(s);
4395 } else {
4396 if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4397 if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4398 has_trans = 1;
4399 for (k=0; k < s->img_n; ++k)
4400 tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
4402 break;
4405 case STBI__PNG_TYPE('I','D','A','T'): {
4406 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4407 if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4408 if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4409 if ((int)(ioff + c.length) < (int)ioff) return 0;
4410 if (ioff + c.length > idata_limit) {
4411 stbi_uc *p;
4412 if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4413 while (ioff + c.length > idata_limit)
4414 idata_limit *= 2;
4415 p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4416 z->idata = p;
4418 if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4419 ioff += c.length;
4420 break;
4423 case STBI__PNG_TYPE('I','E','N','D'): {
4424 stbi__uint32 raw_len, bpl;
4425 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4426 if (scan != STBI__SCAN_load) return 1;
4427 if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4428 // initial guess for decoded data size to avoid unnecessary reallocs
4429 bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
4430 raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4431 z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4432 if (z->expanded == NULL) return 0; // zlib should set error
4433 STBI_FREE(z->idata); z->idata = NULL;
4434 if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4435 s->img_out_n = s->img_n+1;
4436 else
4437 s->img_out_n = s->img_n;
4438 if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
4439 if (has_trans)
4440 if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4441 if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4442 stbi__de_iphone(z);
4443 if (pal_img_n) {
4444 // pal_img_n == 3 or 4
4445 s->img_n = pal_img_n; // record the actual colors we had
4446 s->img_out_n = pal_img_n;
4447 if (req_comp >= 3) s->img_out_n = req_comp;
4448 if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4449 return 0;
4451 STBI_FREE(z->expanded); z->expanded = NULL;
4452 return 1;
4455 default:
4456 // if critical, fail
4457 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4458 if ((c.type & (1 << 29)) == 0) {
4459 #ifndef STBI_NO_FAILURE_STRINGS
4460 // not threadsafe
4461 static char invalid_chunk[] = "XXXX PNG chunk not known";
4462 invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4463 invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4464 invalid_chunk[2] = STBI__BYTECAST(c.type >> 8);
4465 invalid_chunk[3] = STBI__BYTECAST(c.type >> 0);
4466 #endif
4467 return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4469 stbi__skip(s, c.length);
4470 break;
4472 // end of PNG chunk, read and skip CRC
4473 stbi__get32be(s);
4477 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4479 unsigned char *result=NULL;
4480 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4481 if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4482 result = p->out;
4483 p->out = NULL;
4484 if (req_comp && req_comp != p->s->img_out_n) {
4485 result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4486 p->s->img_out_n = req_comp;
4487 if (result == NULL) return result;
4489 *x = p->s->img_x;
4490 *y = p->s->img_y;
4491 if (n) *n = p->s->img_out_n;
4493 STBI_FREE(p->out); p->out = NULL;
4494 STBI_FREE(p->expanded); p->expanded = NULL;
4495 STBI_FREE(p->idata); p->idata = NULL;
4497 return result;
4500 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4502 stbi__png p;
4503 p.s = s;
4504 return stbi__do_png(&p, x,y,comp,req_comp);
4507 static int stbi__png_test(stbi__context *s)
4509 int r;
4510 r = stbi__check_png_header(s);
4511 stbi__rewind(s);
4512 return r;
4515 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4517 if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4518 stbi__rewind( p->s );
4519 return 0;
4521 if (x) *x = p->s->img_x;
4522 if (y) *y = p->s->img_y;
4523 if (comp) *comp = p->s->img_n;
4524 return 1;
4527 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4529 stbi__png p;
4530 p.s = s;
4531 return stbi__png_info_raw(&p, x, y, comp);
4533 #endif
4535 // Microsoft/Windows BMP image
4537 #ifndef STBI_NO_BMP
4538 static int stbi__bmp_test_raw(stbi__context *s)
4540 int r;
4541 int sz;
4542 if (stbi__get8(s) != 'B') return 0;
4543 if (stbi__get8(s) != 'M') return 0;
4544 stbi__get32le(s); // discard filesize
4545 stbi__get16le(s); // discard reserved
4546 stbi__get16le(s); // discard reserved
4547 stbi__get32le(s); // discard data offset
4548 sz = stbi__get32le(s);
4549 r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4550 return r;
4553 static int stbi__bmp_test(stbi__context *s)
4555 int r = stbi__bmp_test_raw(s);
4556 stbi__rewind(s);
4557 return r;
4561 // returns 0..31 for the highest set bit
4562 static int stbi__high_bit(unsigned int z)
4564 int n=0;
4565 if (z == 0) return -1;
4566 if (z >= 0x10000) n += 16, z >>= 16;
4567 if (z >= 0x00100) n += 8, z >>= 8;
4568 if (z >= 0x00010) n += 4, z >>= 4;
4569 if (z >= 0x00004) n += 2, z >>= 2;
4570 if (z >= 0x00002) n += 1, z >>= 1;
4571 return n;
4574 static int stbi__bitcount(unsigned int a)
4576 a = (a & 0x55555555) + ((a >> 1) & 0x55555555); // max 2
4577 a = (a & 0x33333333) + ((a >> 2) & 0x33333333); // max 4
4578 a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4579 a = (a + (a >> 8)); // max 16 per 8 bits
4580 a = (a + (a >> 16)); // max 32 per 8 bits
4581 return a & 0xff;
4584 static int stbi__shiftsigned(int v, int shift, int bits)
4586 int result;
4587 int z=0;
4589 if (shift < 0) v <<= -shift;
4590 else v >>= shift;
4591 result = v;
4593 z = bits;
4594 while (z < 8) {
4595 result += v >> z;
4596 z += bits;
4598 return result;
4601 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4603 stbi_uc *out;
4604 unsigned int mr=0,mg=0,mb=0,ma=0, all_a=255;
4605 stbi_uc pal[256][4];
4606 int psize=0,i,j,compress=0,width;
4607 int bpp, flip_vertically, pad, target, offset, hsz;
4608 if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4609 stbi__get32le(s); // discard filesize
4610 stbi__get16le(s); // discard reserved
4611 stbi__get16le(s); // discard reserved
4612 offset = stbi__get32le(s);
4613 hsz = stbi__get32le(s);
4614 if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4615 if (hsz == 12) {
4616 s->img_x = stbi__get16le(s);
4617 s->img_y = stbi__get16le(s);
4618 } else {
4619 s->img_x = stbi__get32le(s);
4620 s->img_y = stbi__get32le(s);
4622 if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4623 bpp = stbi__get16le(s);
4624 if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4625 flip_vertically = ((int) s->img_y) > 0;
4626 s->img_y = abs((int) s->img_y);
4627 if (hsz == 12) {
4628 if (bpp < 24)
4629 psize = (offset - 14 - 24) / 3;
4630 } else {
4631 compress = stbi__get32le(s);
4632 if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4633 stbi__get32le(s); // discard sizeof
4634 stbi__get32le(s); // discard hres
4635 stbi__get32le(s); // discard vres
4636 stbi__get32le(s); // discard colorsused
4637 stbi__get32le(s); // discard max important
4638 if (hsz == 40 || hsz == 56) {
4639 if (hsz == 56) {
4640 stbi__get32le(s);
4641 stbi__get32le(s);
4642 stbi__get32le(s);
4643 stbi__get32le(s);
4645 if (bpp == 16 || bpp == 32) {
4646 mr = mg = mb = 0;
4647 if (compress == 0) {
4648 if (bpp == 32) {
4649 mr = 0xffu << 16;
4650 mg = 0xffu << 8;
4651 mb = 0xffu << 0;
4652 ma = 0xffu << 24;
4653 all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
4654 } else {
4655 mr = 31u << 10;
4656 mg = 31u << 5;
4657 mb = 31u << 0;
4659 } else if (compress == 3) {
4660 mr = stbi__get32le(s);
4661 mg = stbi__get32le(s);
4662 mb = stbi__get32le(s);
4663 // not documented, but generated by photoshop and handled by mspaint
4664 if (mr == mg && mg == mb) {
4665 // ?!?!?
4666 return stbi__errpuc("bad BMP", "bad BMP");
4668 } else
4669 return stbi__errpuc("bad BMP", "bad BMP");
4671 } else {
4672 STBI_ASSERT(hsz == 108 || hsz == 124);
4673 mr = stbi__get32le(s);
4674 mg = stbi__get32le(s);
4675 mb = stbi__get32le(s);
4676 ma = stbi__get32le(s);
4677 stbi__get32le(s); // discard color space
4678 for (i=0; i < 12; ++i)
4679 stbi__get32le(s); // discard color space parameters
4680 if (hsz == 124) {
4681 stbi__get32le(s); // discard rendering intent
4682 stbi__get32le(s); // discard offset of profile data
4683 stbi__get32le(s); // discard size of profile data
4684 stbi__get32le(s); // discard reserved
4687 if (bpp < 16)
4688 psize = (offset - 14 - hsz) >> 2;
4690 s->img_n = ma ? 4 : 3;
4691 if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4692 target = req_comp;
4693 else
4694 target = s->img_n; // if they want monochrome, we'll post-convert
4695 out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4696 if (!out) return stbi__errpuc("outofmem", "Out of memory");
4697 if (bpp < 16) {
4698 int z=0;
4699 if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4700 for (i=0; i < psize; ++i) {
4701 pal[i][2] = stbi__get8(s);
4702 pal[i][1] = stbi__get8(s);
4703 pal[i][0] = stbi__get8(s);
4704 if (hsz != 12) stbi__get8(s);
4705 pal[i][3] = 255;
4707 stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
4708 if (bpp == 4) width = (s->img_x + 1) >> 1;
4709 else if (bpp == 8) width = s->img_x;
4710 else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4711 pad = (-width)&3;
4712 for (j=0; j < (int) s->img_y; ++j) {
4713 for (i=0; i < (int) s->img_x; i += 2) {
4714 int v=stbi__get8(s),v2=0;
4715 if (bpp == 4) {
4716 v2 = v & 15;
4717 v >>= 4;
4719 out[z++] = pal[v][0];
4720 out[z++] = pal[v][1];
4721 out[z++] = pal[v][2];
4722 if (target == 4) out[z++] = 255;
4723 if (i+1 == (int) s->img_x) break;
4724 v = (bpp == 8) ? stbi__get8(s) : v2;
4725 out[z++] = pal[v][0];
4726 out[z++] = pal[v][1];
4727 out[z++] = pal[v][2];
4728 if (target == 4) out[z++] = 255;
4730 stbi__skip(s, pad);
4732 } else {
4733 int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4734 int z = 0;
4735 int easy=0;
4736 stbi__skip(s, offset - 14 - hsz);
4737 if (bpp == 24) width = 3 * s->img_x;
4738 else if (bpp == 16) width = 2*s->img_x;
4739 else /* bpp = 32 and pad = 0 */ width=0;
4740 pad = (-width) & 3;
4741 if (bpp == 24) {
4742 easy = 1;
4743 } else if (bpp == 32) {
4744 if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4745 easy = 2;
4747 if (!easy) {
4748 if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4749 // right shift amt to put high bit in position #7
4750 rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4751 gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4752 bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4753 ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4755 for (j=0; j < (int) s->img_y; ++j) {
4756 if (easy) {
4757 for (i=0; i < (int) s->img_x; ++i) {
4758 unsigned char a;
4759 out[z+2] = stbi__get8(s);
4760 out[z+1] = stbi__get8(s);
4761 out[z+0] = stbi__get8(s);
4762 z += 3;
4763 a = (easy == 2 ? stbi__get8(s) : 255);
4764 all_a |= a;
4765 if (target == 4) out[z++] = a;
4767 } else {
4768 for (i=0; i < (int) s->img_x; ++i) {
4769 stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
4770 int a;
4771 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4772 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4773 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4774 a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4775 all_a |= a;
4776 if (target == 4) out[z++] = STBI__BYTECAST(a);
4779 stbi__skip(s, pad);
4783 // if alpha channel is all 0s, replace with all 255s
4784 if (target == 4 && all_a == 0)
4785 for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
4786 out[i] = 255;
4788 if (flip_vertically) {
4789 stbi_uc t;
4790 for (j=0; j < (int) s->img_y>>1; ++j) {
4791 stbi_uc *p1 = out + j *s->img_x*target;
4792 stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4793 for (i=0; i < (int) s->img_x*target; ++i) {
4794 t = p1[i], p1[i] = p2[i], p2[i] = t;
4799 if (req_comp && req_comp != target) {
4800 out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4801 if (out == NULL) return out; // stbi__convert_format frees input on failure
4804 *x = s->img_x;
4805 *y = s->img_y;
4806 if (comp) *comp = s->img_n;
4807 return out;
4809 #endif
4811 // Targa Truevision - TGA
4812 // by Jonathan Dummer
4813 #ifndef STBI_NO_TGA
4814 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4816 int tga_w, tga_h, tga_comp;
4817 int sz;
4818 stbi__get8(s); // discard Offset
4819 sz = stbi__get8(s); // color type
4820 if( sz > 1 ) {
4821 stbi__rewind(s);
4822 return 0; // only RGB or indexed allowed
4824 sz = stbi__get8(s); // image type
4825 // only RGB or grey allowed, +/- RLE
4826 if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
4827 stbi__skip(s,9);
4828 tga_w = stbi__get16le(s);
4829 if( tga_w < 1 ) {
4830 stbi__rewind(s);
4831 return 0; // test width
4833 tga_h = stbi__get16le(s);
4834 if( tga_h < 1 ) {
4835 stbi__rewind(s);
4836 return 0; // test height
4838 sz = stbi__get8(s); // bits per pixel
4839 // only RGB or RGBA or grey allowed
4840 if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
4841 stbi__rewind(s);
4842 return 0;
4844 tga_comp = sz;
4845 if (x) *x = tga_w;
4846 if (y) *y = tga_h;
4847 if (comp) *comp = tga_comp / 8;
4848 return 1; // seems to have passed everything
4851 static int stbi__tga_test(stbi__context *s)
4853 int res;
4854 int sz;
4855 stbi__get8(s); // discard Offset
4856 sz = stbi__get8(s); // color type
4857 if ( sz > 1 ) return 0; // only RGB or indexed allowed
4858 sz = stbi__get8(s); // image type
4859 if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0; // only RGB or grey allowed, +/- RLE
4860 stbi__get16be(s); // discard palette start
4861 stbi__get16be(s); // discard palette length
4862 stbi__get8(s); // discard bits per palette color entry
4863 stbi__get16be(s); // discard x origin
4864 stbi__get16be(s); // discard y origin
4865 if ( stbi__get16be(s) < 1 ) return 0; // test width
4866 if ( stbi__get16be(s) < 1 ) return 0; // test height
4867 sz = stbi__get8(s); // bits per pixel
4868 if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
4869 res = 0;
4870 else
4871 res = 1;
4872 stbi__rewind(s);
4873 return res;
4876 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4878 // read in the TGA header stuff
4879 int tga_offset = stbi__get8(s);
4880 int tga_indexed = stbi__get8(s);
4881 int tga_image_type = stbi__get8(s);
4882 int tga_is_RLE = 0;
4883 int tga_palette_start = stbi__get16le(s);
4884 int tga_palette_len = stbi__get16le(s);
4885 int tga_palette_bits = stbi__get8(s);
4886 int tga_x_origin = stbi__get16le(s);
4887 int tga_y_origin = stbi__get16le(s);
4888 int tga_width = stbi__get16le(s);
4889 int tga_height = stbi__get16le(s);
4890 int tga_bits_per_pixel = stbi__get8(s);
4891 int tga_comp = tga_bits_per_pixel / 8;
4892 int tga_inverted = stbi__get8(s);
4893 // image data
4894 unsigned char *tga_data;
4895 unsigned char *tga_palette = NULL;
4896 int i, j;
4897 unsigned char raw_data[4];
4898 int RLE_count = 0;
4899 int RLE_repeating = 0;
4900 int read_next_pixel = 1;
4902 // do a tiny bit of precessing
4903 if ( tga_image_type >= 8 )
4905 tga_image_type -= 8;
4906 tga_is_RLE = 1;
4908 /* int tga_alpha_bits = tga_inverted & 15; */
4909 tga_inverted = 1 - ((tga_inverted >> 5) & 1);
4911 // error check
4912 if ( //(tga_indexed) ||
4913 (tga_width < 1) || (tga_height < 1) ||
4914 (tga_image_type < 1) || (tga_image_type > 3) ||
4915 ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
4916 (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
4919 return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
4922 // If I'm paletted, then I'll use the number of bits from the palette
4923 if ( tga_indexed )
4925 tga_comp = tga_palette_bits / 8;
4928 // tga info
4929 *x = tga_width;
4930 *y = tga_height;
4931 if (comp) *comp = tga_comp;
4933 tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
4934 if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
4936 // skip to the data's starting position (offset usually = 0)
4937 stbi__skip(s, tga_offset );
4939 if ( !tga_indexed && !tga_is_RLE) {
4940 for (i=0; i < tga_height; ++i) {
4941 int row = tga_inverted ? tga_height -i - 1 : i;
4942 stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
4943 stbi__getn(s, tga_row, tga_width * tga_comp);
4945 } else {
4946 // do I need to load a palette?
4947 if ( tga_indexed)
4949 // any data to skip? (offset usually = 0)
4950 stbi__skip(s, tga_palette_start );
4951 // load the palette
4952 tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
4953 if (!tga_palette) {
4954 STBI_FREE(tga_data);
4955 return stbi__errpuc("outofmem", "Out of memory");
4957 if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
4958 STBI_FREE(tga_data);
4959 STBI_FREE(tga_palette);
4960 return stbi__errpuc("bad palette", "Corrupt TGA");
4963 // load the data
4964 for (i=0; i < tga_width * tga_height; ++i)
4966 // if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
4967 if ( tga_is_RLE )
4969 if ( RLE_count == 0 )
4971 // yep, get the next byte as a RLE command
4972 int RLE_cmd = stbi__get8(s);
4973 RLE_count = 1 + (RLE_cmd & 127);
4974 RLE_repeating = RLE_cmd >> 7;
4975 read_next_pixel = 1;
4976 } else if ( !RLE_repeating )
4978 read_next_pixel = 1;
4980 } else
4982 read_next_pixel = 1;
4984 // OK, if I need to read a pixel, do it now
4985 if ( read_next_pixel )
4987 // load however much data we did have
4988 if ( tga_indexed )
4990 // read in 1 byte, then perform the lookup
4991 int pal_idx = stbi__get8(s);
4992 if ( pal_idx >= tga_palette_len )
4994 // invalid index
4995 pal_idx = 0;
4997 pal_idx *= tga_bits_per_pixel / 8;
4998 for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5000 raw_data[j] = tga_palette[pal_idx+j];
5002 } else
5004 // read in the data raw
5005 for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5007 raw_data[j] = stbi__get8(s);
5010 // clear the reading flag for the next pixel
5011 read_next_pixel = 0;
5012 } // end of reading a pixel
5014 // copy data
5015 for (j = 0; j < tga_comp; ++j)
5016 tga_data[i*tga_comp+j] = raw_data[j];
5018 // in case we're in RLE mode, keep counting down
5019 --RLE_count;
5021 // do I need to invert the image?
5022 if ( tga_inverted )
5024 for (j = 0; j*2 < tga_height; ++j)
5026 int index1 = j * tga_width * tga_comp;
5027 int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5028 for (i = tga_width * tga_comp; i > 0; --i)
5030 unsigned char temp = tga_data[index1];
5031 tga_data[index1] = tga_data[index2];
5032 tga_data[index2] = temp;
5033 ++index1;
5034 ++index2;
5038 // clear my palette, if I had one
5039 if ( tga_palette != NULL )
5041 STBI_FREE( tga_palette );
5045 // swap RGB
5046 if (tga_comp >= 3)
5048 unsigned char* tga_pixel = tga_data;
5049 for (i=0; i < tga_width * tga_height; ++i)
5051 unsigned char temp = tga_pixel[0];
5052 tga_pixel[0] = tga_pixel[2];
5053 tga_pixel[2] = temp;
5054 tga_pixel += tga_comp;
5058 // convert to target component count
5059 if (req_comp && req_comp != tga_comp)
5060 tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5062 // the things I do to get rid of an error message, and yet keep
5063 // Microsoft's C compilers happy... [8^(
5064 tga_palette_start = tga_palette_len = tga_palette_bits =
5065 tga_x_origin = tga_y_origin = 0;
5066 // OK, done
5067 return tga_data;
5069 #endif
5071 // *************************************************************************************************
5072 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5074 #ifndef STBI_NO_PSD
5075 static int stbi__psd_test(stbi__context *s)
5077 int r = (stbi__get32be(s) == 0x38425053);
5078 stbi__rewind(s);
5079 return r;
5082 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5084 int pixelCount;
5085 int channelCount, compression;
5086 int channel, i, count, len;
5087 int bitdepth;
5088 int w,h;
5089 stbi_uc *out;
5091 // Check identifier
5092 if (stbi__get32be(s) != 0x38425053) // "8BPS"
5093 return stbi__errpuc("not PSD", "Corrupt PSD image");
5095 // Check file type version.
5096 if (stbi__get16be(s) != 1)
5097 return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5099 // Skip 6 reserved bytes.
5100 stbi__skip(s, 6 );
5102 // Read the number of channels (R, G, B, A, etc).
5103 channelCount = stbi__get16be(s);
5104 if (channelCount < 0 || channelCount > 16)
5105 return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5107 // Read the rows and columns of the image.
5108 h = stbi__get32be(s);
5109 w = stbi__get32be(s);
5111 // Make sure the depth is 8 bits.
5112 bitdepth = stbi__get16be(s);
5113 if (bitdepth != 8 && bitdepth != 16)
5114 return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5116 // Make sure the color mode is RGB.
5117 // Valid options are:
5118 // 0: Bitmap
5119 // 1: Grayscale
5120 // 2: Indexed color
5121 // 3: RGB color
5122 // 4: CMYK color
5123 // 7: Multichannel
5124 // 8: Duotone
5125 // 9: Lab color
5126 if (stbi__get16be(s) != 3)
5127 return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5129 // Skip the Mode Data. (It's the palette for indexed color; other info for other modes.)
5130 stbi__skip(s,stbi__get32be(s) );
5132 // Skip the image resources. (resolution, pen tool paths, etc)
5133 stbi__skip(s, stbi__get32be(s) );
5135 // Skip the reserved data.
5136 stbi__skip(s, stbi__get32be(s) );
5138 // Find out if the data is compressed.
5139 // Known values:
5140 // 0: no compression
5141 // 1: RLE compressed
5142 compression = stbi__get16be(s);
5143 if (compression > 1)
5144 return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5146 // Create the destination image.
5147 out = (stbi_uc *) stbi__malloc(4 * w*h);
5148 if (!out) return stbi__errpuc("outofmem", "Out of memory");
5149 pixelCount = w*h;
5151 // Initialize the data to zero.
5152 //memset( out, 0, pixelCount * 4 );
5154 // Finally, the image data.
5155 if (compression) {
5156 // RLE as used by .PSD and .TIFF
5157 // Loop until you get the number of unpacked bytes you are expecting:
5158 // Read the next source byte into n.
5159 // If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5160 // Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5161 // Else if n is 128, noop.
5162 // Endloop
5164 // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5165 // which we're going to just skip.
5166 stbi__skip(s, h * channelCount * 2 );
5168 // Read the RLE data by channel.
5169 for (channel = 0; channel < 4; channel++) {
5170 stbi_uc *p;
5172 p = out+channel;
5173 if (channel >= channelCount) {
5174 // Fill this channel with default data.
5175 for (i = 0; i < pixelCount; i++, p += 4)
5176 *p = (channel == 3 ? 255 : 0);
5177 } else {
5178 // Read the RLE data.
5179 count = 0;
5180 while (count < pixelCount) {
5181 len = stbi__get8(s);
5182 if (len == 128) {
5183 // No-op.
5184 } else if (len < 128) {
5185 // Copy next len+1 bytes literally.
5186 len++;
5187 count += len;
5188 while (len) {
5189 *p = stbi__get8(s);
5190 p += 4;
5191 len--;
5193 } else if (len > 128) {
5194 stbi_uc val;
5195 // Next -len+1 bytes in the dest are replicated from next source byte.
5196 // (Interpret len as a negative 8-bit int.)
5197 len ^= 0x0FF;
5198 len += 2;
5199 val = stbi__get8(s);
5200 count += len;
5201 while (len) {
5202 *p = val;
5203 p += 4;
5204 len--;
5211 } else {
5212 // We're at the raw image data. It's each channel in order (Red, Green, Blue, Alpha, ...)
5213 // where each channel consists of an 8-bit value for each pixel in the image.
5215 // Read the data by channel.
5216 for (channel = 0; channel < 4; channel++) {
5217 stbi_uc *p;
5219 p = out + channel;
5220 if (channel >= channelCount) {
5221 // Fill this channel with default data.
5222 stbi_uc val = channel == 3 ? 255 : 0;
5223 for (i = 0; i < pixelCount; i++, p += 4)
5224 *p = val;
5225 } else {
5226 // Read the data.
5227 if (bitdepth == 16) {
5228 for (i = 0; i < pixelCount; i++, p += 4)
5229 *p = (stbi_uc) (stbi__get16be(s) >> 8);
5230 } else {
5231 for (i = 0; i < pixelCount; i++, p += 4)
5232 *p = stbi__get8(s);
5238 if (req_comp && req_comp != 4) {
5239 out = stbi__convert_format(out, 4, req_comp, w, h);
5240 if (out == NULL) return out; // stbi__convert_format frees input on failure
5243 if (comp) *comp = 4;
5244 *y = h;
5245 *x = w;
5247 return out;
5249 #endif
5251 // *************************************************************************************************
5252 // Softimage PIC loader
5253 // by Tom Seddon
5255 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5256 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5258 #ifndef STBI_NO_PIC
5259 static int stbi__pic_is4(stbi__context *s,const char *str)
5261 int i;
5262 for (i=0; i<4; ++i)
5263 if (stbi__get8(s) != (stbi_uc)str[i])
5264 return 0;
5266 return 1;
5269 static int stbi__pic_test_core(stbi__context *s)
5271 int i;
5273 if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5274 return 0;
5276 for(i=0;i<84;++i)
5277 stbi__get8(s);
5279 if (!stbi__pic_is4(s,"PICT"))
5280 return 0;
5282 return 1;
5285 typedef struct
5287 stbi_uc size,type,channel;
5288 } stbi__pic_packet;
5290 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5292 int mask=0x80, i;
5294 for (i=0; i<4; ++i, mask>>=1) {
5295 if (channel & mask) {
5296 if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5297 dest[i]=stbi__get8(s);
5301 return dest;
5304 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5306 int mask=0x80,i;
5308 for (i=0;i<4; ++i, mask>>=1)
5309 if (channel&mask)
5310 dest[i]=src[i];
5313 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5315 int act_comp=0,num_packets=0,y,chained;
5316 stbi__pic_packet packets[10];
5318 // this will (should...) cater for even some bizarre stuff like having data
5319 // for the same channel in multiple packets.
5320 do {
5321 stbi__pic_packet *packet;
5323 if (num_packets==sizeof(packets)/sizeof(packets[0]))
5324 return stbi__errpuc("bad format","too many packets");
5326 packet = &packets[num_packets++];
5328 chained = stbi__get8(s);
5329 packet->size = stbi__get8(s);
5330 packet->type = stbi__get8(s);
5331 packet->channel = stbi__get8(s);
5333 act_comp |= packet->channel;
5335 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (reading packets)");
5336 if (packet->size != 8) return stbi__errpuc("bad format","packet isn't 8bpp");
5337 } while (chained);
5339 *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5341 for(y=0; y<height; ++y) {
5342 int packet_idx;
5344 for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5345 stbi__pic_packet *packet = &packets[packet_idx];
5346 stbi_uc *dest = result+y*width*4;
5348 switch (packet->type) {
5349 default:
5350 return stbi__errpuc("bad format","packet has bad compression type");
5352 case 0: {//uncompressed
5353 int x;
5355 for(x=0;x<width;++x, dest+=4)
5356 if (!stbi__readval(s,packet->channel,dest))
5357 return 0;
5358 break;
5361 case 1://Pure RLE
5363 int left=width, i;
5365 while (left>0) {
5366 stbi_uc count,value[4];
5368 count=stbi__get8(s);
5369 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (pure read count)");
5371 if (count > left)
5372 count = (stbi_uc) left;
5374 if (!stbi__readval(s,packet->channel,value)) return 0;
5376 for(i=0; i<count; ++i,dest+=4)
5377 stbi__copyval(packet->channel,dest,value);
5378 left -= count;
5381 break;
5383 case 2: {//Mixed RLE
5384 int left=width;
5385 while (left>0) {
5386 int count = stbi__get8(s), i;
5387 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (mixed read count)");
5389 if (count >= 128) { // Repeated
5390 stbi_uc value[4];
5392 if (count==128)
5393 count = stbi__get16be(s);
5394 else
5395 count -= 127;
5396 if (count > left)
5397 return stbi__errpuc("bad file","scanline overrun");
5399 if (!stbi__readval(s,packet->channel,value))
5400 return 0;
5402 for(i=0;i<count;++i, dest += 4)
5403 stbi__copyval(packet->channel,dest,value);
5404 } else { // Raw
5405 ++count;
5406 if (count>left) return stbi__errpuc("bad file","scanline overrun");
5408 for(i=0;i<count;++i, dest+=4)
5409 if (!stbi__readval(s,packet->channel,dest))
5410 return 0;
5412 left-=count;
5414 break;
5420 return result;
5423 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5425 stbi_uc *result;
5426 int i, x,y;
5428 for (i=0; i<92; ++i)
5429 stbi__get8(s);
5431 x = stbi__get16be(s);
5432 y = stbi__get16be(s);
5433 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (pic header)");
5434 if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5436 stbi__get32be(s); //skip `ratio'
5437 stbi__get16be(s); //skip `fields'
5438 stbi__get16be(s); //skip `pad'
5440 // intermediate buffer is RGBA
5441 result = (stbi_uc *) stbi__malloc(x*y*4);
5442 memset(result, 0xff, x*y*4);
5444 if (!stbi__pic_load_core(s,x,y,comp, result)) {
5445 STBI_FREE(result);
5446 result=0;
5448 *px = x;
5449 *py = y;
5450 if (req_comp == 0) req_comp = *comp;
5451 result=stbi__convert_format(result,4,req_comp,x,y);
5453 return result;
5456 static int stbi__pic_test(stbi__context *s)
5458 int r = stbi__pic_test_core(s);
5459 stbi__rewind(s);
5460 return r;
5462 #endif
5464 // *************************************************************************************************
5465 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5467 #ifndef STBI_NO_GIF
5468 typedef struct
5470 stbi__int16 prefix;
5471 stbi_uc first;
5472 stbi_uc suffix;
5473 } stbi__gif_lzw;
5475 typedef struct
5477 int w,h;
5478 stbi_uc *out, *old_out; // output buffer (always 4 components)
5479 int flags, bgindex, ratio, transparent, eflags, delay;
5480 stbi_uc pal[256][4];
5481 stbi_uc lpal[256][4];
5482 stbi__gif_lzw codes[4096];
5483 stbi_uc *color_table;
5484 int parse, step;
5485 int lflags;
5486 int start_x, start_y;
5487 int max_x, max_y;
5488 int cur_x, cur_y;
5489 int line_size;
5490 } stbi__gif;
5492 static int stbi__gif_test_raw(stbi__context *s)
5494 int sz;
5495 if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5496 sz = stbi__get8(s);
5497 if (sz != '9' && sz != '7') return 0;
5498 if (stbi__get8(s) != 'a') return 0;
5499 return 1;
5502 static int stbi__gif_test(stbi__context *s)
5504 int r = stbi__gif_test_raw(s);
5505 stbi__rewind(s);
5506 return r;
5509 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5511 int i;
5512 for (i=0; i < num_entries; ++i) {
5513 pal[i][2] = stbi__get8(s);
5514 pal[i][1] = stbi__get8(s);
5515 pal[i][0] = stbi__get8(s);
5516 pal[i][3] = transp == i ? 0 : 255;
5520 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5522 stbi_uc version;
5523 if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5524 return stbi__err("not GIF", "Corrupt GIF");
5526 version = stbi__get8(s);
5527 if (version != '7' && version != '9') return stbi__err("not GIF", "Corrupt GIF");
5528 if (stbi__get8(s) != 'a') return stbi__err("not GIF", "Corrupt GIF");
5530 stbi__g_failure_reason = "";
5531 g->w = stbi__get16le(s);
5532 g->h = stbi__get16le(s);
5533 g->flags = stbi__get8(s);
5534 g->bgindex = stbi__get8(s);
5535 g->ratio = stbi__get8(s);
5536 g->transparent = -1;
5538 if (comp != 0) *comp = 4; // can't actually tell whether it's 3 or 4 until we parse the comments
5540 if (is_info) return 1;
5542 if (g->flags & 0x80)
5543 stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5545 return 1;
5548 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5550 stbi__gif g;
5551 if (!stbi__gif_header(s, &g, comp, 1)) {
5552 stbi__rewind( s );
5553 return 0;
5555 if (x) *x = g.w;
5556 if (y) *y = g.h;
5557 return 1;
5560 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5562 stbi_uc *p, *c;
5564 // recurse to decode the prefixes, since the linked-list is backwards,
5565 // and working backwards through an interleaved image would be nasty
5566 if (g->codes[code].prefix >= 0)
5567 stbi__out_gif_code(g, g->codes[code].prefix);
5569 if (g->cur_y >= g->max_y) return;
5571 p = &g->out[g->cur_x + g->cur_y];
5572 c = &g->color_table[g->codes[code].suffix * 4];
5574 if (c[3] >= 128) {
5575 p[0] = c[2];
5576 p[1] = c[1];
5577 p[2] = c[0];
5578 p[3] = c[3];
5580 g->cur_x += 4;
5582 if (g->cur_x >= g->max_x) {
5583 g->cur_x = g->start_x;
5584 g->cur_y += g->step;
5586 while (g->cur_y >= g->max_y && g->parse > 0) {
5587 g->step = (1 << g->parse) * g->line_size;
5588 g->cur_y = g->start_y + (g->step >> 1);
5589 --g->parse;
5594 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5596 stbi_uc lzw_cs;
5597 stbi__int32 len, init_code;
5598 stbi__uint32 first;
5599 stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5600 stbi__gif_lzw *p;
5602 lzw_cs = stbi__get8(s);
5603 if (lzw_cs > 12) return NULL;
5604 clear = 1 << lzw_cs;
5605 first = 1;
5606 codesize = lzw_cs + 1;
5607 codemask = (1 << codesize) - 1;
5608 bits = 0;
5609 valid_bits = 0;
5610 for (init_code = 0; init_code < clear; init_code++) {
5611 g->codes[init_code].prefix = -1;
5612 g->codes[init_code].first = (stbi_uc) init_code;
5613 g->codes[init_code].suffix = (stbi_uc) init_code;
5616 // support no starting clear code
5617 avail = clear+2;
5618 oldcode = -1;
5620 len = 0;
5621 for(;;) {
5622 if (valid_bits < codesize) {
5623 if (len == 0) {
5624 len = stbi__get8(s); // start new block
5625 if (len == 0)
5626 return g->out;
5628 --len;
5629 bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5630 valid_bits += 8;
5631 } else {
5632 stbi__int32 code = bits & codemask;
5633 bits >>= codesize;
5634 valid_bits -= codesize;
5635 // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5636 if (code == clear) { // clear code
5637 codesize = lzw_cs + 1;
5638 codemask = (1 << codesize) - 1;
5639 avail = clear + 2;
5640 oldcode = -1;
5641 first = 0;
5642 } else if (code == clear + 1) { // end of stream code
5643 stbi__skip(s, len);
5644 while ((len = stbi__get8(s)) > 0)
5645 stbi__skip(s,len);
5646 return g->out;
5647 } else if (code <= avail) {
5648 if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5650 if (oldcode >= 0) {
5651 p = &g->codes[avail++];
5652 if (avail > 4096) return stbi__errpuc("too many codes", "Corrupt GIF");
5653 p->prefix = (stbi__int16) oldcode;
5654 p->first = g->codes[oldcode].first;
5655 p->suffix = (code == avail) ? p->first : g->codes[code].first;
5656 } else if (code == avail)
5657 return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5659 stbi__out_gif_code(g, (stbi__uint16) code);
5661 if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5662 codesize++;
5663 codemask = (1 << codesize) - 1;
5666 oldcode = code;
5667 } else {
5668 return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5674 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
5676 int x, y;
5677 stbi_uc *c = g->pal[g->bgindex];
5678 for (y = y0; y < y1; y += 4 * g->w) {
5679 for (x = x0; x < x1; x += 4) {
5680 stbi_uc *p = &g->out[y + x];
5681 p[0] = c[2];
5682 p[1] = c[1];
5683 p[2] = c[0];
5684 p[3] = 0;
5689 // this function is designed to support animated gifs, although stb_image doesn't support it
5690 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5692 int i;
5693 stbi_uc *prev_out = 0;
5695 if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
5696 return 0; // stbi__g_failure_reason set by stbi__gif_header
5698 prev_out = g->out;
5699 g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5700 if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
5702 switch ((g->eflags & 0x1C) >> 2) {
5703 case 0: // unspecified (also always used on 1st frame)
5704 stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
5705 break;
5706 case 1: // do not dispose
5707 if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5708 g->old_out = prev_out;
5709 break;
5710 case 2: // dispose to background
5711 if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5712 stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
5713 break;
5714 case 3: // dispose to previous
5715 if (g->old_out) {
5716 for (i = g->start_y; i < g->max_y; i += 4 * g->w)
5717 memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
5719 break;
5722 for (;;) {
5723 switch (stbi__get8(s)) {
5724 case 0x2C: /* Image Descriptor */
5726 int prev_trans = -1;
5727 stbi__int32 x, y, w, h;
5728 stbi_uc *o;
5730 x = stbi__get16le(s);
5731 y = stbi__get16le(s);
5732 w = stbi__get16le(s);
5733 h = stbi__get16le(s);
5734 if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5735 return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5737 g->line_size = g->w * 4;
5738 g->start_x = x * 4;
5739 g->start_y = y * g->line_size;
5740 g->max_x = g->start_x + w * 4;
5741 g->max_y = g->start_y + h * g->line_size;
5742 g->cur_x = g->start_x;
5743 g->cur_y = g->start_y;
5745 g->lflags = stbi__get8(s);
5747 if (g->lflags & 0x40) {
5748 g->step = 8 * g->line_size; // first interlaced spacing
5749 g->parse = 3;
5750 } else {
5751 g->step = g->line_size;
5752 g->parse = 0;
5755 if (g->lflags & 0x80) {
5756 stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5757 g->color_table = (stbi_uc *) g->lpal;
5758 } else if (g->flags & 0x80) {
5759 if (g->transparent >= 0 && (g->eflags & 0x01)) {
5760 prev_trans = g->pal[g->transparent][3];
5761 g->pal[g->transparent][3] = 0;
5763 g->color_table = (stbi_uc *) g->pal;
5764 } else
5765 return stbi__errpuc("missing color table", "Corrupt GIF");
5767 o = stbi__process_gif_raster(s, g);
5768 if (o == NULL) return NULL;
5770 if (prev_trans != -1)
5771 g->pal[g->transparent][3] = (stbi_uc) prev_trans;
5773 return o;
5776 case 0x21: // Comment Extension.
5778 int len;
5779 if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
5780 len = stbi__get8(s);
5781 if (len == 4) {
5782 g->eflags = stbi__get8(s);
5783 g->delay = stbi__get16le(s);
5784 g->transparent = stbi__get8(s);
5785 } else {
5786 stbi__skip(s, len);
5787 break;
5790 while ((len = stbi__get8(s)) != 0)
5791 stbi__skip(s, len);
5792 break;
5795 case 0x3B: // gif stream termination code
5796 return (stbi_uc *) s; // using '1' causes warning on some compilers
5798 default:
5799 return stbi__errpuc("unknown code", "Corrupt GIF");
5803 STBI_NOTUSED(req_comp);
5806 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5808 stbi_uc *u = 0;
5809 stbi__gif g;
5810 memset(&g, 0, sizeof(g));
5812 u = stbi__gif_load_next(s, &g, comp, req_comp);
5813 if (u == (stbi_uc *) s) u = 0; // end of animated gif marker
5814 if (u) {
5815 *x = g.w;
5816 *y = g.h;
5817 if (req_comp && req_comp != 4)
5818 u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
5820 else if (g.out)
5821 STBI_FREE(g.out);
5823 return u;
5826 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
5828 return stbi__gif_info_raw(s,x,y,comp);
5830 #endif
5832 // *************************************************************************************************
5833 // Radiance RGBE HDR loader
5834 // originally by Nicolas Schulz
5835 #ifndef STBI_NO_HDR
5836 static int stbi__hdr_test_core(stbi__context *s)
5838 const char *signature = "#?RADIANCE\n";
5839 int i;
5840 for (i=0; signature[i]; ++i)
5841 if (stbi__get8(s) != signature[i])
5842 return 0;
5843 return 1;
5846 static int stbi__hdr_test(stbi__context* s)
5848 int r = stbi__hdr_test_core(s);
5849 stbi__rewind(s);
5850 return r;
5853 #define STBI__HDR_BUFLEN 1024
5854 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
5856 int len=0;
5857 char c = '\0';
5859 c = (char) stbi__get8(z);
5861 while (!stbi__at_eof(z) && c != '\n') {
5862 buffer[len++] = c;
5863 if (len == STBI__HDR_BUFLEN-1) {
5864 // flush to end of line
5865 while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
5867 break;
5869 c = (char) stbi__get8(z);
5872 buffer[len] = 0;
5873 return buffer;
5876 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
5878 if ( input[3] != 0 ) {
5879 float f1;
5880 // Exponent
5881 f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
5882 if (req_comp <= 2)
5883 output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
5884 else {
5885 output[0] = input[0] * f1;
5886 output[1] = input[1] * f1;
5887 output[2] = input[2] * f1;
5889 if (req_comp == 2) output[1] = 1;
5890 if (req_comp == 4) output[3] = 1;
5891 } else {
5892 switch (req_comp) {
5893 case 4: output[3] = 1; /* fallthrough */
5894 case 3: output[0] = output[1] = output[2] = 0;
5895 break;
5896 case 2: output[1] = 1; /* fallthrough */
5897 case 1: output[0] = 0;
5898 break;
5903 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5905 char buffer[STBI__HDR_BUFLEN];
5906 char *token;
5907 int valid = 0;
5908 int width, height;
5909 stbi_uc *scanline;
5910 float *hdr_data;
5911 int len;
5912 unsigned char count, value;
5913 int i, j, k, c1,c2, z;
5916 // Check identifier
5917 if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
5918 return stbi__errpf("not HDR", "Corrupt HDR image");
5920 // Parse header
5921 for(;;) {
5922 token = stbi__hdr_gettoken(s,buffer);
5923 if (token[0] == 0) break;
5924 if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5927 if (!valid) return stbi__errpf("unsupported format", "Unsupported HDR format");
5929 // Parse width and height
5930 // can't use sscanf() if we're not using stdio!
5931 token = stbi__hdr_gettoken(s,buffer);
5932 if (strncmp(token, "-Y ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5933 token += 3;
5934 height = (int) strtol(token, &token, 10);
5935 while (*token == ' ') ++token;
5936 if (strncmp(token, "+X ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5937 token += 3;
5938 width = (int) strtol(token, NULL, 10);
5940 *x = width;
5941 *y = height;
5943 if (comp) *comp = 3;
5944 if (req_comp == 0) req_comp = 3;
5946 // Read data
5947 hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
5949 // Load image data
5950 // image data is stored as some number of sca
5951 if ( width < 8 || width >= 32768) {
5952 // Read flat data
5953 for (j=0; j < height; ++j) {
5954 for (i=0; i < width; ++i) {
5955 stbi_uc rgbe[4];
5956 main_decode_loop:
5957 stbi__getn(s, rgbe, 4);
5958 stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
5961 } else {
5962 // Read RLE-encoded data
5963 scanline = NULL;
5965 for (j = 0; j < height; ++j) {
5966 c1 = stbi__get8(s);
5967 c2 = stbi__get8(s);
5968 len = stbi__get8(s);
5969 if (c1 != 2 || c2 != 2 || (len & 0x80)) {
5970 // not run-length encoded, so we have to actually use THIS data as a decoded
5971 // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
5972 stbi_uc rgbe[4];
5973 rgbe[0] = (stbi_uc) c1;
5974 rgbe[1] = (stbi_uc) c2;
5975 rgbe[2] = (stbi_uc) len;
5976 rgbe[3] = (stbi_uc) stbi__get8(s);
5977 stbi__hdr_convert(hdr_data, rgbe, req_comp);
5978 i = 1;
5979 j = 0;
5980 STBI_FREE(scanline);
5981 goto main_decode_loop; // yes, this makes no sense
5983 len <<= 8;
5984 len |= stbi__get8(s);
5985 if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
5986 if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
5988 for (k = 0; k < 4; ++k) {
5989 i = 0;
5990 while (i < width) {
5991 count = stbi__get8(s);
5992 if (count > 128) {
5993 // Run
5994 value = stbi__get8(s);
5995 count -= 128;
5996 for (z = 0; z < count; ++z)
5997 scanline[i++ * 4 + k] = value;
5998 } else {
5999 // Dump
6000 for (z = 0; z < count; ++z)
6001 scanline[i++ * 4 + k] = stbi__get8(s);
6005 for (i=0; i < width; ++i)
6006 stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
6008 STBI_FREE(scanline);
6011 return hdr_data;
6014 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6016 char buffer[STBI__HDR_BUFLEN];
6017 char *token;
6018 int valid = 0;
6020 if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
6021 stbi__rewind( s );
6022 return 0;
6025 for(;;) {
6026 token = stbi__hdr_gettoken(s,buffer);
6027 if (token[0] == 0) break;
6028 if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6031 if (!valid) {
6032 stbi__rewind( s );
6033 return 0;
6035 token = stbi__hdr_gettoken(s,buffer);
6036 if (strncmp(token, "-Y ", 3)) {
6037 stbi__rewind( s );
6038 return 0;
6040 token += 3;
6041 *y = (int) strtol(token, &token, 10);
6042 while (*token == ' ') ++token;
6043 if (strncmp(token, "+X ", 3)) {
6044 stbi__rewind( s );
6045 return 0;
6047 token += 3;
6048 *x = (int) strtol(token, NULL, 10);
6049 *comp = 3;
6050 return 1;
6052 #endif // STBI_NO_HDR
6054 #ifndef STBI_NO_BMP
6055 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6057 int hsz;
6058 if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
6059 stbi__rewind( s );
6060 return 0;
6062 stbi__skip(s,12);
6063 hsz = stbi__get32le(s);
6064 if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
6065 stbi__rewind( s );
6066 return 0;
6068 if (hsz == 12) {
6069 *x = stbi__get16le(s);
6070 *y = stbi__get16le(s);
6071 } else {
6072 *x = stbi__get32le(s);
6073 *y = stbi__get32le(s);
6075 if (stbi__get16le(s) != 1) {
6076 stbi__rewind( s );
6077 return 0;
6079 *comp = stbi__get16le(s) / 8;
6080 return 1;
6082 #endif
6084 #ifndef STBI_NO_PSD
6085 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6087 int channelCount;
6088 if (stbi__get32be(s) != 0x38425053) {
6089 stbi__rewind( s );
6090 return 0;
6092 if (stbi__get16be(s) != 1) {
6093 stbi__rewind( s );
6094 return 0;
6096 stbi__skip(s, 6);
6097 channelCount = stbi__get16be(s);
6098 if (channelCount < 0 || channelCount > 16) {
6099 stbi__rewind( s );
6100 return 0;
6102 *y = stbi__get32be(s);
6103 *x = stbi__get32be(s);
6104 if (stbi__get16be(s) != 8) {
6105 stbi__rewind( s );
6106 return 0;
6108 if (stbi__get16be(s) != 3) {
6109 stbi__rewind( s );
6110 return 0;
6112 *comp = 4;
6113 return 1;
6115 #endif
6117 #ifndef STBI_NO_PIC
6118 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6120 int act_comp=0,num_packets=0,chained;
6121 stbi__pic_packet packets[10];
6123 if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
6124 stbi__rewind(s);
6125 return 0;
6128 stbi__skip(s, 88);
6130 *x = stbi__get16be(s);
6131 *y = stbi__get16be(s);
6132 if (stbi__at_eof(s)) {
6133 stbi__rewind( s);
6134 return 0;
6136 if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
6137 stbi__rewind( s );
6138 return 0;
6141 stbi__skip(s, 8);
6143 do {
6144 stbi__pic_packet *packet;
6146 if (num_packets==sizeof(packets)/sizeof(packets[0]))
6147 return 0;
6149 packet = &packets[num_packets++];
6150 chained = stbi__get8(s);
6151 packet->size = stbi__get8(s);
6152 packet->type = stbi__get8(s);
6153 packet->channel = stbi__get8(s);
6154 act_comp |= packet->channel;
6156 if (stbi__at_eof(s)) {
6157 stbi__rewind( s );
6158 return 0;
6160 if (packet->size != 8) {
6161 stbi__rewind( s );
6162 return 0;
6164 } while (chained);
6166 *comp = (act_comp & 0x10 ? 4 : 3);
6168 return 1;
6170 #endif
6172 // *************************************************************************************************
6173 // Portable Gray Map and Portable Pixel Map loader
6174 // by Ken Miller
6176 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6177 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6179 // Known limitations:
6180 // Does not support comments in the header section
6181 // Does not support ASCII image data (formats P2 and P3)
6182 // Does not support 16-bit-per-channel
6184 #ifndef STBI_NO_PNM
6186 static int stbi__pnm_test(stbi__context *s)
6188 char p, t;
6189 p = (char) stbi__get8(s);
6190 t = (char) stbi__get8(s);
6191 if (p != 'P' || (t != '5' && t != '6')) {
6192 stbi__rewind( s );
6193 return 0;
6195 return 1;
6198 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6200 stbi_uc *out;
6201 if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6202 return 0;
6203 *x = s->img_x;
6204 *y = s->img_y;
6205 *comp = s->img_n;
6207 out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6208 if (!out) return stbi__errpuc("outofmem", "Out of memory");
6209 stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6211 if (req_comp && req_comp != s->img_n) {
6212 out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6213 if (out == NULL) return out; // stbi__convert_format frees input on failure
6215 return out;
6218 static int stbi__pnm_isspace(char c)
6220 return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6223 static void stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6225 while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6226 *c = (char) stbi__get8(s);
6229 static int stbi__pnm_isdigit(char c)
6231 return c >= '0' && c <= '9';
6234 static int stbi__pnm_getinteger(stbi__context *s, char *c)
6236 int value = 0;
6238 while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6239 value = value*10 + (*c - '0');
6240 *c = (char) stbi__get8(s);
6243 return value;
6246 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6248 int maxv;
6249 char c, p, t;
6251 stbi__rewind( s );
6253 // Get identifier
6254 p = (char) stbi__get8(s);
6255 t = (char) stbi__get8(s);
6256 if (p != 'P' || (t != '5' && t != '6')) {
6257 stbi__rewind( s );
6258 return 0;
6261 *comp = (t == '6') ? 3 : 1; // '5' is 1-component .pgm; '6' is 3-component .ppm
6263 c = (char) stbi__get8(s);
6264 stbi__pnm_skip_whitespace(s, &c);
6266 *x = stbi__pnm_getinteger(s, &c); // read width
6267 stbi__pnm_skip_whitespace(s, &c);
6269 *y = stbi__pnm_getinteger(s, &c); // read height
6270 stbi__pnm_skip_whitespace(s, &c);
6272 maxv = stbi__pnm_getinteger(s, &c); // read max value
6274 if (maxv > 255)
6275 return stbi__err("max value > 255", "PPM image not 8-bit");
6276 else
6277 return 1;
6279 #endif
6281 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6283 #ifndef STBI_NO_JPEG
6284 if (stbi__jpeg_info(s, x, y, comp)) return 1;
6285 #endif
6287 #ifndef STBI_NO_PNG
6288 if (stbi__png_info(s, x, y, comp)) return 1;
6289 #endif
6291 #ifndef STBI_NO_GIF
6292 if (stbi__gif_info(s, x, y, comp)) return 1;
6293 #endif
6295 #ifndef STBI_NO_BMP
6296 if (stbi__bmp_info(s, x, y, comp)) return 1;
6297 #endif
6299 #ifndef STBI_NO_PSD
6300 if (stbi__psd_info(s, x, y, comp)) return 1;
6301 #endif
6303 #ifndef STBI_NO_PIC
6304 if (stbi__pic_info(s, x, y, comp)) return 1;
6305 #endif
6307 #ifndef STBI_NO_PNM
6308 if (stbi__pnm_info(s, x, y, comp)) return 1;
6309 #endif
6311 #ifndef STBI_NO_HDR
6312 if (stbi__hdr_info(s, x, y, comp)) return 1;
6313 #endif
6315 // test tga last because it's a crappy test!
6316 #ifndef STBI_NO_TGA
6317 if (stbi__tga_info(s, x, y, comp))
6318 return 1;
6319 #endif
6320 return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6323 #ifndef STBI_NO_STDIO
6324 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6326 FILE *f = stbi__fopen(filename, "rb");
6327 int result;
6328 if (!f) return stbi__err("can't fopen", "Unable to open file");
6329 result = stbi_info_from_file(f, x, y, comp);
6330 fclose(f);
6331 return result;
6334 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6336 int r;
6337 stbi__context s;
6338 long pos = ftell(f);
6339 stbi__start_file(&s, f);
6340 r = stbi__info_main(&s,x,y,comp);
6341 fseek(f,pos,SEEK_SET);
6342 return r;
6344 #endif // !STBI_NO_STDIO
6346 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6348 stbi__context s;
6349 stbi__start_mem(&s,buffer,len);
6350 return stbi__info_main(&s,x,y,comp);
6353 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6355 stbi__context s;
6356 stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6357 return stbi__info_main(&s,x,y,comp);
6360 #endif // STB_IMAGE_IMPLEMENTATION
6363 revision history:
6364 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
6365 2.07 (2015-09-13) fix compiler warnings
6366 partial animated GIF support
6367 limited 16-bit PSD support
6368 #ifdef unused functions
6369 bug with < 92 byte PIC,PNM,HDR,TGA
6370 2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value
6371 2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning
6372 2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit
6373 2.03 (2015-04-12) extra corruption checking (mmozeiko)
6374 stbi_set_flip_vertically_on_load (nguillemot)
6375 fix NEON support; fix mingw support
6376 2.02 (2015-01-19) fix incorrect assert, fix warning
6377 2.01 (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6378 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6379 2.00 (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6380 progressive JPEG (stb)
6381 PGM/PPM support (Ken Miller)
6382 STBI_MALLOC,STBI_REALLOC,STBI_FREE
6383 GIF bugfix -- seemingly never worked
6384 STBI_NO_*, STBI_ONLY_*
6385 1.48 (2014-12-14) fix incorrectly-named assert()
6386 1.47 (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6387 optimize PNG (ryg)
6388 fix bug in interlaced PNG with user-specified channel count (stb)
6389 1.46 (2014-08-26)
6390 fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6391 1.45 (2014-08-16)
6392 fix MSVC-ARM internal compiler error by wrapping malloc
6393 1.44 (2014-08-07)
6394 various warning fixes from Ronny Chevalier
6395 1.43 (2014-07-15)
6396 fix MSVC-only compiler problem in code changed in 1.42
6397 1.42 (2014-07-09)
6398 don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6399 fixes to stbi__cleanup_jpeg path
6400 added STBI_ASSERT to avoid requiring assert.h
6401 1.41 (2014-06-25)
6402 fix search&replace from 1.36 that messed up comments/error messages
6403 1.40 (2014-06-22)
6404 fix gcc struct-initialization warning
6405 1.39 (2014-06-15)
6406 fix to TGA optimization when req_comp != number of components in TGA;
6407 fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6408 add support for BMP version 5 (more ignored fields)
6409 1.38 (2014-06-06)
6410 suppress MSVC warnings on integer casts truncating values
6411 fix accidental rename of 'skip' field of I/O
6412 1.37 (2014-06-04)
6413 remove duplicate typedef
6414 1.36 (2014-06-03)
6415 convert to header file single-file library
6416 if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6417 1.35 (2014-05-27)
6418 various warnings
6419 fix broken STBI_SIMD path
6420 fix bug where stbi_load_from_file no longer left file pointer in correct place
6421 fix broken non-easy path for 32-bit BMP (possibly never used)
6422 TGA optimization by Arseny Kapoulkine
6423 1.34 (unknown)
6424 use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6425 1.33 (2011-07-14)
6426 make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6427 1.32 (2011-07-13)
6428 support for "info" function for all supported filetypes (SpartanJ)
6429 1.31 (2011-06-20)
6430 a few more leak fixes, bug in PNG handling (SpartanJ)
6431 1.30 (2011-06-11)
6432 added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
6433 removed deprecated format-specific test/load functions
6434 removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6435 error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6436 fix inefficiency in decoding 32-bit BMP (David Woo)
6437 1.29 (2010-08-16)
6438 various warning fixes from Aurelien Pocheville
6439 1.28 (2010-08-01)
6440 fix bug in GIF palette transparency (SpartanJ)
6441 1.27 (2010-08-01)
6442 cast-to-stbi_uc to fix warnings
6443 1.26 (2010-07-24)
6444 fix bug in file buffering for PNG reported by SpartanJ
6445 1.25 (2010-07-17)
6446 refix trans_data warning (Won Chun)
6447 1.24 (2010-07-12)
6448 perf improvements reading from files on platforms with lock-heavy fgetc()
6449 minor perf improvements for jpeg
6450 deprecated type-specific functions so we'll get feedback if they're needed
6451 attempt to fix trans_data warning (Won Chun)
6452 1.23 fixed bug in iPhone support
6453 1.22 (2010-07-10)
6454 removed image *writing* support
6455 stbi_info support from Jetro Lauha
6456 GIF support from Jean-Marc Lienher
6457 iPhone PNG-extensions from James Brown
6458 warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6459 1.21 fix use of 'stbi_uc' in header (reported by jon blow)
6460 1.20 added support for Softimage PIC, by Tom Seddon
6461 1.19 bug in interlaced PNG corruption check (found by ryg)
6462 1.18 (2008-08-02)
6463 fix a threading bug (local mutable static)
6464 1.17 support interlaced PNG
6465 1.16 major bugfix - stbi__convert_format converted one too many pixels
6466 1.15 initialize some fields for thread safety
6467 1.14 fix threadsafe conversion bug
6468 header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6469 1.13 threadsafe
6470 1.12 const qualifiers in the API
6471 1.11 Support installable IDCT, colorspace conversion routines
6472 1.10 Fixes for 64-bit (don't use "unsigned long")
6473 optimized upsampling by Fabian "ryg" Giesen
6474 1.09 Fix format-conversion for PSD code (bad global variables!)
6475 1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6476 1.07 attempt to fix C++ warning/errors again
6477 1.06 attempt to fix C++ warning/errors again
6478 1.05 fix TGA loading to return correct *comp and use good luminance calc
6479 1.04 default float alpha is 1, not 255; use 'void *' for stbi_image_free
6480 1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6481 1.02 support for (subset of) HDR files, float interface for preferred access to them
6482 1.01 fix bug: possible bug in handling right-side up bmps... not sure
6483 fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6484 1.00 interface to zlib that skips zlib header
6485 0.99 correct handling of alpha in palette
6486 0.98 TGA loader by lonesock; dynamically add loaders (untested)
6487 0.97 jpeg errors on too large a file; also catch another malloc failure
6488 0.96 fix detection of invalid v value - particleman@mollyrocket forum
6489 0.95 during header scan, seek to markers in case of padding
6490 0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6491 0.93 handle jpegtran output; verbose errors
6492 0.92 read 4,8,16,24,32-bit BMP files of several formats
6493 0.91 output 24-bit Windows 3.0 BMP files
6494 0.90 fix a few more warnings; bump version number to approach 1.0
6495 0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd
6496 0.60 fix compiling as c++
6497 0.59 fix warnings: merge Dave Moore's -Wall fixes
6498 0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian
6499 0.57 fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6500 0.56 fix bug: zlib uncompressed mode len vs. nlen
6501 0.55 fix bug: restart_interval not initialized to 0
6502 0.54 allow NULL for 'int *comp'
6503 0.53 fix bug in png 3->4; speedup png decoding
6504 0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6505 0.51 obey req_comp requests, 1-component jpegs return as 1-component,
6506 on 'test' only check type, not whether we support this variant
6507 0.50 (2006-11-19)
6508 first released version