stb_image: update stb_image.h to v2.02
[fbvis.git] / stb_image.c
blobc3945c2e25d3ec40a7b9090a5e5e937055665ed9
1 /* stb_image - v2.02 - public domain image loader - http://nothings.org/stb_image.h
2 no warranty implied; use at your own risk
4 Do this:
5 #define STB_IMAGE_IMPLEMENTATION
6 before you include this file in *one* C or C++ file to create the implementation.
8 // i.e. it should look like this:
9 #include ...
10 #include ...
11 #include ...
12 #define STB_IMAGE_IMPLEMENTATION
13 #include "stb_image.h"
15 You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16 And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
19 QUICK NOTES:
20 Primarily of interest to game developers and other people who can
21 avoid problematic images and only need the trivial interface
23 JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24 PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
26 TGA (not sure what subset, if a subset)
27 BMP non-1bpp, non-RLE
28 PSD (composited view only, no extra channels)
30 GIF (*comp always reports as 4-channel)
31 HDR (radiance rgbE format)
32 PIC (Softimage PIC)
33 PNM (PPM and PGM binary only)
35 - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
36 - decode from arbitrary I/O callbacks
37 - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
39 Full documentation under "DOCUMENTATION" below.
42 Revision 2.00 release notes:
44 - Progressive JPEG is now supported.
46 - PPM and PGM binary formats are now supported, thanks to Ken Miller.
48 - x86 platforms now make use of SSE2 SIMD instructions for
49 JPEG decoding, and ARM platforms can use NEON SIMD if requested.
50 This work was done by Fabian "ryg" Giesen. SSE2 is used by
51 default, but NEON must be enabled explicitly; see docs.
53 With other JPEG optimizations included in this version, we see
54 2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
55 on a JPEG on an ARM machine, relative to previous versions of this
56 library. The same results will not obtain for all JPGs and for all
57 x86/ARM machines. (Note that progressive JPEGs are significantly
58 slower to decode than regular JPEGs.) This doesn't mean that this
59 is the fastest JPEG decoder in the land; rather, it brings it
60 closer to parity with standard libraries. If you want the fastest
61 decode, look elsewhere. (See "Philosophy" section of docs below.)
63 See final bullet items below for more info on SIMD.
65 - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
66 the memory allocator. Unlike other STBI libraries, these macros don't
67 support a context parameter, so if you need to pass a context in to
68 the allocator, you'll have to store it in a global or a thread-local
69 variable.
71 - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
72 STBI_NO_LINEAR.
73 STBI_NO_HDR: suppress implementation of .hdr reader format
74 STBI_NO_LINEAR: suppress high-dynamic-range light-linear float API
76 - You can suppress implementation of any of the decoders to reduce
77 your code footprint by #defining one or more of the following
78 symbols before creating the implementation.
80 STBI_NO_JPEG
81 STBI_NO_PNG
82 STBI_NO_BMP
83 STBI_NO_PSD
84 STBI_NO_TGA
85 STBI_NO_GIF
86 STBI_NO_HDR
87 STBI_NO_PIC
88 STBI_NO_PNM (.ppm and .pgm)
90 - You can request *only* certain decoders and suppress all other ones
91 (this will be more forward-compatible, as addition of new decoders
92 doesn't require you to disable them explicitly):
94 STBI_ONLY_JPEG
95 STBI_ONLY_PNG
96 STBI_ONLY_BMP
97 STBI_ONLY_PSD
98 STBI_ONLY_TGA
99 STBI_ONLY_GIF
100 STBI_ONLY_HDR
101 STBI_ONLY_PIC
102 STBI_ONLY_PNM (.ppm and .pgm)
104 Note that you can define multiples of these, and you will get all
105 of them ("only x" and "only y" is interpreted to mean "only x&y").
107 - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
108 want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
110 - Compilation of all SIMD code can be suppressed with
111 #define STBI_NO_SIMD
112 It should not be necessary to disable SIMD unless you have issues
113 compiling (e.g. using an x86 compiler which doesn't support SSE
114 intrinsics or that doesn't support the method used to detect
115 SSE2 support at run-time), and even those can be reported as
116 bugs so I can refine the built-in compile-time checking to be
117 smarter.
119 - The old STBI_SIMD system which allowed installing a user-defined
120 IDCT etc. has been removed. If you need this, don't upgrade. My
121 assumption is that almost nobody was doing this, and those who
122 were will find the built-in SIMD more satisfactory anyway.
124 - RGB values computed for JPEG images are slightly different from
125 previous versions of stb_image. (This is due to using less
126 integer precision in SIMD.) The C code has been adjusted so
127 that the same RGB values will be computed regardless of whether
128 SIMD support is available, so your app should always produce
129 consistent results. But these results are slightly different from
130 previous versions. (Specifically, about 3% of available YCbCr values
131 will compute different RGB results from pre-1.49 versions by +-1;
132 most of the deviating values are one smaller in the G channel.)
134 - If you must produce consistent results with previous versions of
135 stb_image, #define STBI_JPEG_OLD and you will get the same results
136 you used to; however, you will not get the SIMD speedups for
137 the YCbCr-to-RGB conversion step (although you should still see
138 significant JPEG speedup from the other changes).
140 Please note that STBI_JPEG_OLD is a temporary feature; it will be
141 removed in future versions of the library. It is only intended for
142 near-term back-compatibility use.
145 Latest revision history:
146 2.02 (2015-01-19) fix incorrect assert, fix warning
147 2.01 (2015-01-17) fix various warnings
148 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
149 2.00 (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
150 progressive JPEG
151 PGM/PPM support
152 STBI_MALLOC,STBI_REALLOC,STBI_FREE
153 STBI_NO_*, STBI_ONLY_*
154 GIF bugfix
155 1.48 (2014-12-14) fix incorrectly-named assert()
156 1.47 (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
157 optimize PNG
158 fix bug in interlaced PNG with user-specified channel count
159 1.46 (2014-08-26) fix broken tRNS chunk in non-paletted PNG
160 1.45 (2014-08-16) workaround MSVC-ARM internal compiler error by wrapping malloc
162 See end of file for full revision history.
165 ============================ Contributors =========================
167 Image formats Bug fixes & warning fixes
168 Sean Barrett (jpeg, png, bmp) Marc LeBlanc
169 Nicolas Schulz (hdr, psd) Christpher Lloyd
170 Jonathan Dummer (tga) Dave Moore
171 Jean-Marc Lienher (gif) Won Chun
172 Tom Seddon (pic) the Horde3D community
173 Thatcher Ulrich (psd) Janez Zemva
174 Ken Miller (pgm, ppm) Jonathan Blow
175 Laurent Gomila
176 Aruelien Pocheville
177 Extensions, features Ryamond Barbiero
178 Jetro Lauha (stbi_info) David Woo
179 Martin "SpartanJ" Golini (stbi_info) Martin Golini
180 James "moose2000" Brown (iPhone PNG) Roy Eltham
181 Ben "Disch" Wenger (io callbacks) Luke Graham
182 Omar Cornut (1/2/4-bit PNG) Thomas Ruf
183 John Bartholomew
184 Ken Hamada
185 Optimizations & bugfixes Cort Stratton
186 Fabian "ryg" Giesen Blazej Dariusz Roszkowski
187 Arseny Kapoulkine Thibault Reuille
188 Paul Du Bois
189 Guillaume George
190 If your name should be here but Jerry Jansson
191 isn't, let Sean know. Hayaki Saito
192 Johan Duparc
193 Ronny Chevalier
194 Michal Cichon
195 Tero Hanninen
196 Sergio Gonzalez
197 Cass Everitt
198 Engin Manap
200 License:
201 This software is in the public domain. Where that dedication is not
202 recognized, you are granted a perpetual, irrevocable license to copy
203 and modify this file however you want.
207 #ifndef STBI_INCLUDE_STB_IMAGE_H
208 #define STBI_INCLUDE_STB_IMAGE_H
210 // DOCUMENTATION
212 // Limitations:
213 // - no 16-bit-per-channel PNG
214 // - no 12-bit-per-channel JPEG
215 // - no JPEGs with arithmetic coding
216 // - no 1-bit BMP
217 // - GIF always returns *comp=4
219 // Basic usage (see HDR discussion below for HDR usage):
220 // int x,y,n;
221 // unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
222 // // ... process data if not NULL ...
223 // // ... x = width, y = height, n = # 8-bit components per pixel ...
224 // // ... replace '0' with '1'..'4' to force that many components per pixel
225 // // ... but 'n' will always be the number that it would have been if you said 0
226 // stbi_image_free(data)
228 // Standard parameters:
229 // int *x -- outputs image width in pixels
230 // int *y -- outputs image height in pixels
231 // int *comp -- outputs # of image components in image file
232 // int req_comp -- if non-zero, # of image components requested in result
234 // The return value from an image loader is an 'unsigned char *' which points
235 // to the pixel data, or NULL on an allocation failure or if the image is
236 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
237 // with each pixel consisting of N interleaved 8-bit components; the first
238 // pixel pointed to is top-left-most in the image. There is no padding between
239 // image scanlines or between pixels, regardless of format. The number of
240 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
241 // If req_comp is non-zero, *comp has the number of components that _would_
242 // have been output otherwise. E.g. if you set req_comp to 4, you will always
243 // get RGBA output, but you can check *comp to see if it's trivially opaque
244 // because e.g. there were only 3 channels in the source image.
246 // An output image with N components has the following components interleaved
247 // in this order in each pixel:
249 // N=#comp components
250 // 1 grey
251 // 2 grey, alpha
252 // 3 red, green, blue
253 // 4 red, green, blue, alpha
255 // If image loading fails for any reason, the return value will be NULL,
256 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
257 // can be queried for an extremely brief, end-user unfriendly explanation
258 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
259 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
260 // more user-friendly ones.
262 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
264 // ===========================================================================
266 // Philosophy
268 // stb libraries are designed with the following priorities:
270 // 1. easy to use
271 // 2. easy to maintain
272 // 3. good performance
274 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
275 // and for best performance I may provide less-easy-to-use APIs that give higher
276 // performance, in addition to the easy to use ones. Nevertheless, it's important
277 // to keep in mind that from the standpoint of you, a client of this library,
278 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
280 // Some secondary priorities arise directly from the first two, some of which
281 // make more explicit reasons why performance can't be emphasized.
283 // - Portable ("ease of use")
284 // - Small footprint ("easy to maintain")
285 // - No dependencies ("ease of use")
287 // ===========================================================================
289 // I/O callbacks
291 // I/O callbacks allow you to read from arbitrary sources, like packaged
292 // files or some other source. Data read from callbacks are processed
293 // through a small internal buffer (currently 128 bytes) to try to reduce
294 // overhead.
296 // The three functions you must define are "read" (reads some bytes of data),
297 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
299 // ===========================================================================
301 // SIMD support
303 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
304 // supported by the compiler. For ARM Neon support, you must explicitly
305 // request it.
307 // (The old do-it-yourself SIMD API is no longer supported in the current
308 // code.)
310 // On x86, SSE2 will automatically be used when available based on a run-time
311 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
312 // the typical path is to have separate builds for NEON and non-NEON devices
313 // (at least this is true for iOS and Android). Therefore, the NEON support is
314 // toggled by a build flag: define STBI_NEON to get NEON loops.
316 // The output of the JPEG decoder is slightly different from versions where
317 // SIMD support was introduced (that is, for versions before 1.49). The
318 // difference is only +-1 in the 8-bit RGB channels, and only on a small
319 // fraction of pixels. You can force the pre-1.49 behavior by defining
320 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
321 // and hence cost some performance.
323 // If for some reason you do not want to use any of SIMD code, or if
324 // you have issues compiling it, you can disable it entirely by
325 // defining STBI_NO_SIMD.
327 // ===========================================================================
329 // HDR image support (disable by defining STBI_NO_HDR)
331 // stb_image now supports loading HDR images in general, and currently
332 // the Radiance .HDR file format, although the support is provided
333 // generically. You can still load any file through the existing interface;
334 // if you attempt to load an HDR file, it will be automatically remapped to
335 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
336 // both of these constants can be reconfigured through this interface:
338 // stbi_hdr_to_ldr_gamma(2.2f);
339 // stbi_hdr_to_ldr_scale(1.0f);
341 // (note, do not use _inverse_ constants; stbi_image will invert them
342 // appropriately).
344 // Additionally, there is a new, parallel interface for loading files as
345 // (linear) floats to preserve the full dynamic range:
347 // float *data = stbi_loadf(filename, &x, &y, &n, 0);
349 // If you load LDR images through this interface, those images will
350 // be promoted to floating point values, run through the inverse of
351 // constants corresponding to the above:
353 // stbi_ldr_to_hdr_scale(1.0f);
354 // stbi_ldr_to_hdr_gamma(2.2f);
356 // Finally, given a filename (or an open file or memory block--see header
357 // file for details) containing image data, you can query for the "most
358 // appropriate" interface to use (that is, whether the image is HDR or
359 // not), using:
361 // stbi_is_hdr(char *filename);
363 // ===========================================================================
365 // iPhone PNG support:
367 // By default we convert iphone-formatted PNGs back to RGB, even though
368 // they are internally encoded differently. You can disable this conversion
369 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
370 // you will always just get the native iphone "format" through (which
371 // is BGR stored in RGB).
373 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
374 // pixel to remove any premultiplied alpha *only* if the image file explicitly
375 // says there's premultiplied data (currently only happens in iPhone images,
376 // and only if iPhone convert-to-rgb processing is on).
380 #ifndef STBI_NO_STDIO
381 #include <stdio.h>
382 #endif // STBI_NO_STDIO
384 #define STBI_VERSION 1
386 enum
388 STBI_default = 0, // only used for req_comp
390 STBI_grey = 1,
391 STBI_grey_alpha = 2,
392 STBI_rgb = 3,
393 STBI_rgb_alpha = 4
396 typedef unsigned char stbi_uc;
398 #ifdef __cplusplus
399 extern "C" {
400 #endif
402 #ifdef STB_IMAGE_STATIC
403 #define STBIDEF static
404 #else
405 #define STBIDEF extern
406 #endif
408 //////////////////////////////////////////////////////////////////////////////
410 // PRIMARY API - works on images of any type
414 // load image by filename, open file, or memory buffer
417 typedef struct
419 int (*read) (void *user,char *data,int size); // fill 'data' with 'size' bytes. return number of bytes actually read
420 void (*skip) (void *user,int n); // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
421 int (*eof) (void *user); // returns nonzero if we are at end of file/data
422 } stbi_io_callbacks;
424 STBIDEF stbi_uc *stbi_load (char const *filename, int *x, int *y, int *comp, int req_comp);
425 STBIDEF stbi_uc *stbi_load_from_memory (stbi_uc const *buffer, int len , int *x, int *y, int *comp, int req_comp);
426 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk , void *user, int *x, int *y, int *comp, int req_comp);
428 #ifndef STBI_NO_STDIO
429 STBIDEF stbi_uc *stbi_load_from_file (FILE *f, int *x, int *y, int *comp, int req_comp);
430 // for stbi_load_from_file, file pointer is left pointing immediately after image
431 #endif
433 #ifndef STBI_NO_LINEAR
434 STBIDEF float *stbi_loadf (char const *filename, int *x, int *y, int *comp, int req_comp);
435 STBIDEF float *stbi_loadf_from_memory (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
436 STBIDEF float *stbi_loadf_from_callbacks (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
438 #ifndef STBI_NO_STDIO
439 STBIDEF float *stbi_loadf_from_file (FILE *f, int *x, int *y, int *comp, int req_comp);
440 #endif
441 #endif
443 #ifndef STBI_NO_HDR
444 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma);
445 STBIDEF void stbi_hdr_to_ldr_scale(float scale);
446 #endif
448 #ifndef STBI_NO_LINEAR
449 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma);
450 STBIDEF void stbi_ldr_to_hdr_scale(float scale);
451 #endif // STBI_NO_HDR
453 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
454 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
455 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
456 #ifndef STBI_NO_STDIO
457 STBIDEF int stbi_is_hdr (char const *filename);
458 STBIDEF int stbi_is_hdr_from_file(FILE *f);
459 #endif // STBI_NO_STDIO
462 // get a VERY brief reason for failure
463 // NOT THREADSAFE
464 STBIDEF const char *stbi_failure_reason (void);
466 // free the loaded image -- this is just free()
467 STBIDEF void stbi_image_free (void *retval_from_stbi_load);
469 // get image dimensions & components without fully decoding
470 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
471 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
473 #ifndef STBI_NO_STDIO
474 STBIDEF int stbi_info (char const *filename, int *x, int *y, int *comp);
475 STBIDEF int stbi_info_from_file (FILE *f, int *x, int *y, int *comp);
477 #endif
481 // for image formats that explicitly notate that they have premultiplied alpha,
482 // we just return the colors as stored in the file. set this flag to force
483 // unpremultiplication. results are undefined if the unpremultiply overflow.
484 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
486 // indicate whether we should process iphone images back to canonical format,
487 // or just pass them through "as-is"
488 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
491 // ZLIB client - used by PNG, available for other purposes
493 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
494 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
495 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
496 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
498 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
499 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
502 #ifdef __cplusplus
504 #endif
508 //// end header file /////////////////////////////////////////////////////
509 #endif // STBI_INCLUDE_STB_IMAGE_H
511 #ifdef STB_IMAGE_IMPLEMENTATION
513 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
514 || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
515 || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
516 || defined(STBI_ONLY_ZLIB)
517 #ifndef STBI_ONLY_JPEG
518 #define STBI_NO_JPEG
519 #endif
520 #ifndef STBI_ONLY_PNG
521 #define STBI_NO_PNG
522 #endif
523 #ifndef STBI_ONLY_BMP
524 #define STBI_NO_BMP
525 #endif
526 #ifndef STBI_ONLY_PSD
527 #define STBI_NO_PSD
528 #endif
529 #ifndef STBI_ONLY_TGA
530 #define STBI_NO_TGA
531 #endif
532 #ifndef STBI_ONLY_GIF
533 #define STBI_NO_GIF
534 #endif
535 #ifndef STBI_ONLY_HDR
536 #define STBI_NO_HDR
537 #endif
538 #ifndef STBI_ONLY_PIC
539 #define STBI_NO_PIC
540 #endif
541 #ifndef STBI_ONLY_PNM
542 #define STBI_NO_PNM
543 #endif
544 #endif
546 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
547 #define STBI_NO_ZLIB
548 #endif
551 #include <stdarg.h>
552 #include <stddef.h> // ptrdiff_t on osx
553 #include <stdlib.h>
554 #include <string.h>
556 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
557 #include <math.h> // ldexp
558 #endif
560 #ifndef STBI_NO_STDIO
561 #include <stdio.h>
562 #endif
564 #ifndef STBI_ASSERT
565 #include <assert.h>
566 #define STBI_ASSERT(x) assert(x)
567 #endif
570 #ifndef _MSC_VER
571 #ifdef __cplusplus
572 #define stbi_inline inline
573 #else
574 #define stbi_inline
575 #endif
576 #else
577 #define stbi_inline __forceinline
578 #endif
581 #ifdef _MSC_VER
582 typedef unsigned short stbi__uint16;
583 typedef signed short stbi__int16;
584 typedef unsigned int stbi__uint32;
585 typedef signed int stbi__int32;
586 #else
587 #include <stdint.h>
588 typedef uint16_t stbi__uint16;
589 typedef int16_t stbi__int16;
590 typedef uint32_t stbi__uint32;
591 typedef int32_t stbi__int32;
592 #endif
594 // should produce compiler error if size is wrong
595 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
597 #ifdef _MSC_VER
598 #define STBI_NOTUSED(v) (void)(v)
599 #else
600 #define STBI_NOTUSED(v) (void)sizeof(v)
601 #endif
603 #ifdef _MSC_VER
604 #define STBI_HAS_LROTL
605 #endif
607 #ifdef STBI_HAS_LROTL
608 #define stbi_lrot(x,y) _lrotl(x,y)
609 #else
610 #define stbi_lrot(x,y) (((x) << (y)) | ((x) >> (32 - (y))))
611 #endif
613 #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
614 // ok
615 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
616 // ok
617 #else
618 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
619 #endif
621 #ifndef STBI_MALLOC
622 #define STBI_MALLOC(sz) malloc(sz)
623 #define STBI_REALLOC(p,sz) realloc(p,sz)
624 #define STBI_FREE(p) free(p)
625 #endif
627 #if defined(__GNUC__) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
628 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
629 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
630 // this is just broken and gcc are jerks for not fixing it properly
631 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
632 #define STBI_NO_SIMD
633 #endif
635 #if !defined(STBI_NO_SIMD) && (defined(__x86_64__) || defined(_M_X64) || defined(__i386) || defined(_M_IX86))
636 #define STBI_SSE2
637 #include <emmintrin.h>
639 #ifdef _MSC_VER
641 #if _MSC_VER >= 1400 // not VC6
642 #include <intrin.h> // __cpuid
643 static int stbi__cpuid3(void)
645 int info[4];
646 __cpuid(info,1);
647 return info[3];
649 #else
650 static int stbi__cpuid3(void)
652 int res;
653 __asm {
654 mov eax,1
655 cpuid
656 mov res,edx
658 return res;
660 #endif
662 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
664 static int stbi__sse2_available()
666 int info3 = stbi__cpuid3();
667 return ((info3 >> 26) & 1) != 0;
669 #else // assume GCC-style if not VC++
670 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
672 static int stbi__sse2_available()
674 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
675 // GCC 4.8+ has a nice way to do this
676 return __builtin_cpu_supports("sse2");
677 #else
678 // portable way to do this, preferably without using GCC inline ASM?
679 // just bail for now.
680 return 0;
681 #endif
683 #endif
684 #endif
686 // ARM NEON
687 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
688 #undef STBI_NEON
689 #endif
691 #ifdef STBI_NEON
692 #include <arm_neon.h>
693 // assume GCC or Clang on ARM targets
694 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
695 #endif
697 #ifndef STBI_SIMD_ALIGN
698 #define STBI_SIMD_ALIGN(type, name) type name
699 #endif
701 ///////////////////////////////////////////////
703 // stbi__context struct and start_xxx functions
705 // stbi__context structure is our basic context used by all images, so it
706 // contains all the IO context, plus some basic image information
707 typedef struct
709 stbi__uint32 img_x, img_y;
710 int img_n, img_out_n;
712 stbi_io_callbacks io;
713 void *io_user_data;
715 int read_from_callbacks;
716 int buflen;
717 stbi_uc buffer_start[128];
719 stbi_uc *img_buffer, *img_buffer_end;
720 stbi_uc *img_buffer_original;
721 } stbi__context;
724 static void stbi__refill_buffer(stbi__context *s);
726 // initialize a memory-decode context
727 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
729 s->io.read = NULL;
730 s->read_from_callbacks = 0;
731 s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
732 s->img_buffer_end = (stbi_uc *) buffer+len;
735 // initialize a callback-based context
736 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
738 s->io = *c;
739 s->io_user_data = user;
740 s->buflen = sizeof(s->buffer_start);
741 s->read_from_callbacks = 1;
742 s->img_buffer_original = s->buffer_start;
743 stbi__refill_buffer(s);
746 #ifndef STBI_NO_STDIO
748 static int stbi__stdio_read(void *user, char *data, int size)
750 return (int) fread(data,1,size,(FILE*) user);
753 static void stbi__stdio_skip(void *user, int n)
755 fseek((FILE*) user, n, SEEK_CUR);
758 static int stbi__stdio_eof(void *user)
760 return feof((FILE*) user);
763 static stbi_io_callbacks stbi__stdio_callbacks =
765 stbi__stdio_read,
766 stbi__stdio_skip,
767 stbi__stdio_eof,
770 static void stbi__start_file(stbi__context *s, FILE *f)
772 stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
775 //static void stop_file(stbi__context *s) { }
777 #endif // !STBI_NO_STDIO
779 static void stbi__rewind(stbi__context *s)
781 // conceptually rewind SHOULD rewind to the beginning of the stream,
782 // but we just rewind to the beginning of the initial buffer, because
783 // we only use it after doing 'test', which only ever looks at at most 92 bytes
784 s->img_buffer = s->img_buffer_original;
787 #ifndef STBI_NO_JPEG
788 static int stbi__jpeg_test(stbi__context *s);
789 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
790 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
791 #endif
793 #ifndef STBI_NO_PNG
794 static int stbi__png_test(stbi__context *s);
795 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
796 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
797 #endif
799 #ifndef STBI_NO_BMP
800 static int stbi__bmp_test(stbi__context *s);
801 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
802 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
803 #endif
805 #ifndef STBI_NO_TGA
806 static int stbi__tga_test(stbi__context *s);
807 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
808 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
809 #endif
811 #ifndef STBI_NO_PSD
812 static int stbi__psd_test(stbi__context *s);
813 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
814 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
815 #endif
817 #ifndef STBI_NO_HDR
818 static int stbi__hdr_test(stbi__context *s);
819 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
820 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
821 #endif
823 #ifndef STBI_NO_PIC
824 static int stbi__pic_test(stbi__context *s);
825 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
826 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
827 #endif
829 #ifndef STBI_NO_GIF
830 static int stbi__gif_test(stbi__context *s);
831 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
832 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
833 #endif
835 #ifndef STBI_NO_PNM
836 static int stbi__pnm_test(stbi__context *s);
837 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
838 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
839 #endif
841 // this is not threadsafe
842 static const char *stbi__g_failure_reason;
844 STBIDEF const char *stbi_failure_reason(void)
846 return stbi__g_failure_reason;
849 static int stbi__err(const char *str)
851 stbi__g_failure_reason = str;
852 return 0;
855 static void *stbi__malloc(size_t size)
857 return STBI_MALLOC(size);
860 // stbi__err - error
861 // stbi__errpf - error returning pointer to float
862 // stbi__errpuc - error returning pointer to unsigned char
864 #ifdef STBI_NO_FAILURE_STRINGS
865 #define stbi__err(x,y) 0
866 #elif defined(STBI_FAILURE_USERMSG)
867 #define stbi__err(x,y) stbi__err(y)
868 #else
869 #define stbi__err(x,y) stbi__err(x)
870 #endif
872 #define stbi__errpf(x,y) ((float *) (stbi__err(x,y)?NULL:NULL))
873 #define stbi__errpuc(x,y) ((unsigned char *) (stbi__err(x,y)?NULL:NULL))
875 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
877 STBI_FREE(retval_from_stbi_load);
880 #ifndef STBI_NO_LINEAR
881 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
882 #endif
884 #ifndef STBI_NO_HDR
885 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp);
886 #endif
888 static unsigned char *stbi_load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
890 #ifndef STBI_NO_JPEG
891 if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
892 #endif
893 #ifndef STBI_NO_PNG
894 if (stbi__png_test(s)) return stbi__png_load(s,x,y,comp,req_comp);
895 #endif
896 #ifndef STBI_NO_BMP
897 if (stbi__bmp_test(s)) return stbi__bmp_load(s,x,y,comp,req_comp);
898 #endif
899 #ifndef STBI_NO_GIF
900 if (stbi__gif_test(s)) return stbi__gif_load(s,x,y,comp,req_comp);
901 #endif
902 #ifndef STBI_NO_PSD
903 if (stbi__psd_test(s)) return stbi__psd_load(s,x,y,comp,req_comp);
904 #endif
905 #ifndef STBI_NO_PIC
906 if (stbi__pic_test(s)) return stbi__pic_load(s,x,y,comp,req_comp);
907 #endif
908 #ifndef STBI_NO_PNM
909 if (stbi__pnm_test(s)) return stbi__pnm_load(s,x,y,comp,req_comp);
910 #endif
912 #ifndef STBI_NO_HDR
913 if (stbi__hdr_test(s)) {
914 float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
915 return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
917 #endif
919 #ifndef STBI_NO_TGA
920 // test tga last because it's a crappy test!
921 if (stbi__tga_test(s))
922 return stbi__tga_load(s,x,y,comp,req_comp);
923 #endif
925 return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
928 #ifndef STBI_NO_STDIO
930 static FILE *stbi__fopen(char const *filename, char const *mode)
932 FILE *f;
933 #if defined(_MSC_VER) && _MSC_VER >= 1400
934 if (0 != fopen_s(&f, filename, mode))
935 f=0;
936 #else
937 f = fopen(filename, mode);
938 #endif
939 return f;
943 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
945 FILE *f = stbi__fopen(filename, "rb");
946 unsigned char *result;
947 if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
948 result = stbi_load_from_file(f,x,y,comp,req_comp);
949 fclose(f);
950 return result;
953 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
955 unsigned char *result;
956 stbi__context s;
957 stbi__start_file(&s,f);
958 result = stbi_load_main(&s,x,y,comp,req_comp);
959 if (result) {
960 // need to 'unget' all the characters in the IO buffer
961 fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
963 return result;
965 #endif //!STBI_NO_STDIO
967 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
969 stbi__context s;
970 stbi__start_mem(&s,buffer,len);
971 return stbi_load_main(&s,x,y,comp,req_comp);
974 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
976 stbi__context s;
977 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
978 return stbi_load_main(&s,x,y,comp,req_comp);
981 #ifndef STBI_NO_LINEAR
982 static float *stbi_loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
984 unsigned char *data;
985 #ifndef STBI_NO_HDR
986 if (stbi__hdr_test(s))
987 return stbi__hdr_load(s,x,y,comp,req_comp);
988 #endif
989 data = stbi_load_main(s, x, y, comp, req_comp);
990 if (data)
991 return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
992 return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
995 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
997 stbi__context s;
998 stbi__start_mem(&s,buffer,len);
999 return stbi_loadf_main(&s,x,y,comp,req_comp);
1002 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1004 stbi__context s;
1005 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1006 return stbi_loadf_main(&s,x,y,comp,req_comp);
1009 #ifndef STBI_NO_STDIO
1010 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1012 float *result;
1013 FILE *f = stbi__fopen(filename, "rb");
1014 if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1015 result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1016 fclose(f);
1017 return result;
1020 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1022 stbi__context s;
1023 stbi__start_file(&s,f);
1024 return stbi_loadf_main(&s,x,y,comp,req_comp);
1026 #endif // !STBI_NO_STDIO
1028 #endif // !STBI_NO_LINEAR
1030 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1031 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1032 // reports false!
1034 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1036 #ifndef STBI_NO_HDR
1037 stbi__context s;
1038 stbi__start_mem(&s,buffer,len);
1039 return stbi__hdr_test(&s);
1040 #else
1041 STBI_NOTUSED(buffer);
1042 STBI_NOTUSED(len);
1043 return 0;
1044 #endif
1047 #ifndef STBI_NO_STDIO
1048 STBIDEF int stbi_is_hdr (char const *filename)
1050 FILE *f = stbi__fopen(filename, "rb");
1051 int result=0;
1052 if (f) {
1053 result = stbi_is_hdr_from_file(f);
1054 fclose(f);
1056 return result;
1059 STBIDEF int stbi_is_hdr_from_file(FILE *f)
1061 #ifndef STBI_NO_HDR
1062 stbi__context s;
1063 stbi__start_file(&s,f);
1064 return stbi__hdr_test(&s);
1065 #else
1066 return 0;
1067 #endif
1069 #endif // !STBI_NO_STDIO
1071 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1073 #ifndef STBI_NO_HDR
1074 stbi__context s;
1075 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1076 return stbi__hdr_test(&s);
1077 #else
1078 return 0;
1079 #endif
1082 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1083 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1085 #ifndef STBI_NO_LINEAR
1086 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1087 STBIDEF void stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1088 #endif
1090 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1091 STBIDEF void stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1094 //////////////////////////////////////////////////////////////////////////////
1096 // Common code used by all image loaders
1099 enum
1101 STBI__SCAN_load=0,
1102 STBI__SCAN_type,
1103 STBI__SCAN_header
1106 static void stbi__refill_buffer(stbi__context *s)
1108 int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1109 if (n == 0) {
1110 // at end of file, treat same as if from memory, but need to handle case
1111 // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1112 s->read_from_callbacks = 0;
1113 s->img_buffer = s->buffer_start;
1114 s->img_buffer_end = s->buffer_start+1;
1115 *s->img_buffer = 0;
1116 } else {
1117 s->img_buffer = s->buffer_start;
1118 s->img_buffer_end = s->buffer_start + n;
1122 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1124 if (s->img_buffer < s->img_buffer_end)
1125 return *s->img_buffer++;
1126 if (s->read_from_callbacks) {
1127 stbi__refill_buffer(s);
1128 return *s->img_buffer++;
1130 return 0;
1133 stbi_inline static int stbi__at_eof(stbi__context *s)
1135 if (s->io.read) {
1136 if (!(s->io.eof)(s->io_user_data)) return 0;
1137 // if feof() is true, check if buffer = end
1138 // special case: we've only got the special 0 character at the end
1139 if (s->read_from_callbacks == 0) return 1;
1142 return s->img_buffer >= s->img_buffer_end;
1145 static void stbi__skip(stbi__context *s, int n)
1147 if (s->io.read) {
1148 int blen = (int) (s->img_buffer_end - s->img_buffer);
1149 if (blen < n) {
1150 s->img_buffer = s->img_buffer_end;
1151 (s->io.skip)(s->io_user_data, n - blen);
1152 return;
1155 s->img_buffer += n;
1158 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1160 if (s->io.read) {
1161 int blen = (int) (s->img_buffer_end - s->img_buffer);
1162 if (blen < n) {
1163 int res, count;
1165 memcpy(buffer, s->img_buffer, blen);
1167 count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1168 res = (count == (n-blen));
1169 s->img_buffer = s->img_buffer_end;
1170 return res;
1174 if (s->img_buffer+n <= s->img_buffer_end) {
1175 memcpy(buffer, s->img_buffer, n);
1176 s->img_buffer += n;
1177 return 1;
1178 } else
1179 return 0;
1182 static int stbi__get16be(stbi__context *s)
1184 int z = stbi__get8(s);
1185 return (z << 8) + stbi__get8(s);
1188 static stbi__uint32 stbi__get32be(stbi__context *s)
1190 stbi__uint32 z = stbi__get16be(s);
1191 return (z << 16) + stbi__get16be(s);
1194 static int stbi__get16le(stbi__context *s)
1196 int z = stbi__get8(s);
1197 return z + (stbi__get8(s) << 8);
1200 static stbi__uint32 stbi__get32le(stbi__context *s)
1202 stbi__uint32 z = stbi__get16le(s);
1203 return z + (stbi__get16le(s) << 16);
1206 #define STBI__BYTECAST(x) ((stbi_uc) ((x) & 255)) // truncate int to byte without warnings
1209 //////////////////////////////////////////////////////////////////////////////
1211 // generic converter from built-in img_n to req_comp
1212 // individual types do this automatically as much as possible (e.g. jpeg
1213 // does all cases internally since it needs to colorspace convert anyway,
1214 // and it never has alpha, so very few cases ). png can automatically
1215 // interleave an alpha=255 channel, but falls back to this for other cases
1217 // assume data buffer is malloced, so malloc a new one and free that one
1218 // only failure mode is malloc failing
1220 static stbi_uc stbi__compute_y(int r, int g, int b)
1222 return (stbi_uc) (((r*77) + (g*150) + (29*b)) >> 8);
1225 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1227 int i,j;
1228 unsigned char *good;
1230 if (req_comp == img_n) return data;
1231 STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1233 good = (unsigned char *) stbi__malloc(req_comp * x * y);
1234 if (good == NULL) {
1235 STBI_FREE(data);
1236 return stbi__errpuc("outofmem", "Out of memory");
1239 for (j=0; j < (int) y; ++j) {
1240 unsigned char *src = data + j * x * img_n ;
1241 unsigned char *dest = good + j * x * req_comp;
1243 #define COMBO(a,b) ((a)*8+(b))
1244 #define CASE(a,b) case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1245 // convert source image with img_n components to one with req_comp components;
1246 // avoid switch per pixel, so use switch per scanline and massive macros
1247 switch (COMBO(img_n, req_comp)) {
1248 CASE(1,2) dest[0]=src[0], dest[1]=255; break;
1249 CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1250 CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
1251 CASE(2,1) dest[0]=src[0]; break;
1252 CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1253 CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
1254 CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
1255 CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1256 CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
1257 CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1258 CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
1259 CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
1260 default: STBI_ASSERT(0);
1262 #undef CASE
1265 STBI_FREE(data);
1266 return good;
1269 #ifndef STBI_NO_LINEAR
1270 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1272 int i,k,n;
1273 float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1274 if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1275 // compute number of non-alpha components
1276 if (comp & 1) n = comp; else n = comp-1;
1277 for (i=0; i < x*y; ++i) {
1278 for (k=0; k < n; ++k) {
1279 output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1281 if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1283 STBI_FREE(data);
1284 return output;
1286 #endif
1288 #ifndef STBI_NO_HDR
1289 #define stbi__float2int(x) ((int) (x))
1290 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp)
1292 int i,k,n;
1293 stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1294 if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1295 // compute number of non-alpha components
1296 if (comp & 1) n = comp; else n = comp-1;
1297 for (i=0; i < x*y; ++i) {
1298 for (k=0; k < n; ++k) {
1299 float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1300 if (z < 0) z = 0;
1301 if (z > 255) z = 255;
1302 output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1304 if (k < comp) {
1305 float z = data[i*comp+k] * 255 + 0.5f;
1306 if (z < 0) z = 0;
1307 if (z > 255) z = 255;
1308 output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1311 STBI_FREE(data);
1312 return output;
1314 #endif
1316 //////////////////////////////////////////////////////////////////////////////
1318 // "baseline" JPEG/JFIF decoder
1320 // simple implementation
1321 // - doesn't support delayed output of y-dimension
1322 // - simple interface (only one output format: 8-bit interleaved RGB)
1323 // - doesn't try to recover corrupt jpegs
1324 // - doesn't allow partial loading, loading multiple at once
1325 // - still fast on x86 (copying globals into locals doesn't help x86)
1326 // - allocates lots of intermediate memory (full size of all components)
1327 // - non-interleaved case requires this anyway
1328 // - allows good upsampling (see next)
1329 // high-quality
1330 // - upsampled channels are bilinearly interpolated, even across blocks
1331 // - quality integer IDCT derived from IJG's 'slow'
1332 // performance
1333 // - fast huffman; reasonable integer IDCT
1334 // - some SIMD kernels for common paths on targets with SSE2/NEON
1335 // - uses a lot of intermediate memory, could cache poorly
1337 #ifndef STBI_NO_JPEG
1339 // huffman decoding acceleration
1340 #define FAST_BITS 9 // larger handles more cases; smaller stomps less cache
1342 typedef struct
1344 stbi_uc fast[1 << FAST_BITS];
1345 // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1346 stbi__uint16 code[256];
1347 stbi_uc values[256];
1348 stbi_uc size[257];
1349 unsigned int maxcode[18];
1350 int delta[17]; // old 'firstsymbol' - old 'firstcode'
1351 } stbi__huffman;
1353 typedef struct
1355 stbi__context *s;
1356 stbi__huffman huff_dc[4];
1357 stbi__huffman huff_ac[4];
1358 stbi_uc dequant[4][64];
1359 stbi__int16 fast_ac[4][1 << FAST_BITS];
1361 // sizes for components, interleaved MCUs
1362 int img_h_max, img_v_max;
1363 int img_mcu_x, img_mcu_y;
1364 int img_mcu_w, img_mcu_h;
1366 // definition of jpeg image component
1367 struct
1369 int id;
1370 int h,v;
1371 int tq;
1372 int hd,ha;
1373 int dc_pred;
1375 int x,y,w2,h2;
1376 stbi_uc *data;
1377 void *raw_data, *raw_coeff;
1378 stbi_uc *linebuf;
1379 short *coeff; // progressive only
1380 int coeff_w, coeff_h; // number of 8x8 coefficient blocks
1381 } img_comp[4];
1383 stbi__uint32 code_buffer; // jpeg entropy-coded buffer
1384 int code_bits; // number of valid bits
1385 unsigned char marker; // marker seen while filling entropy buffer
1386 int nomore; // flag if we saw a marker so must stop
1388 int progressive;
1389 int spec_start;
1390 int spec_end;
1391 int succ_high;
1392 int succ_low;
1393 int eob_run;
1395 int scan_n, order[4];
1396 int restart_interval, todo;
1398 // kernels
1399 void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1400 void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1401 stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1402 } stbi__jpeg;
1404 static int stbi__build_huffman(stbi__huffman *h, int *count)
1406 int i,j,k=0,code;
1407 // build size list for each symbol (from JPEG spec)
1408 for (i=0; i < 16; ++i)
1409 for (j=0; j < count[i]; ++j)
1410 h->size[k++] = (stbi_uc) (i+1);
1411 h->size[k] = 0;
1413 // compute actual symbols (from jpeg spec)
1414 code = 0;
1415 k = 0;
1416 for(j=1; j <= 16; ++j) {
1417 // compute delta to add to code to compute symbol id
1418 h->delta[j] = k - code;
1419 if (h->size[k] == j) {
1420 while (h->size[k] == j)
1421 h->code[k++] = (stbi__uint16) (code++);
1422 if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1424 // compute largest code + 1 for this size, preshifted as needed later
1425 h->maxcode[j] = code << (16-j);
1426 code <<= 1;
1428 h->maxcode[j] = 0xffffffff;
1430 // build non-spec acceleration table; 255 is flag for not-accelerated
1431 memset(h->fast, 255, 1 << FAST_BITS);
1432 for (i=0; i < k; ++i) {
1433 int s = h->size[i];
1434 if (s <= FAST_BITS) {
1435 int c = h->code[i] << (FAST_BITS-s);
1436 int m = 1 << (FAST_BITS-s);
1437 for (j=0; j < m; ++j) {
1438 h->fast[c+j] = (stbi_uc) i;
1442 return 1;
1445 // build a table that decodes both magnitude and value of small ACs in
1446 // one go.
1447 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1449 int i;
1450 for (i=0; i < (1 << FAST_BITS); ++i) {
1451 stbi_uc fast = h->fast[i];
1452 fast_ac[i] = 0;
1453 if (fast < 255) {
1454 int rs = h->values[fast];
1455 int run = (rs >> 4) & 15;
1456 int magbits = rs & 15;
1457 int len = h->size[fast];
1459 if (magbits && len + magbits <= FAST_BITS) {
1460 // magnitude code followed by receive_extend code
1461 int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1462 int m = 1 << (magbits - 1);
1463 if (k < m) k += (-1 << magbits) + 1;
1464 // if the result is small enough, we can fit it in fast_ac table
1465 if (k >= -128 && k <= 127)
1466 fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1472 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1474 do {
1475 int b = j->nomore ? 0 : stbi__get8(j->s);
1476 if (b == 0xff) {
1477 int c = stbi__get8(j->s);
1478 if (c != 0) {
1479 j->marker = (unsigned char) c;
1480 j->nomore = 1;
1481 return;
1484 j->code_buffer |= b << (24 - j->code_bits);
1485 j->code_bits += 8;
1486 } while (j->code_bits <= 24);
1489 // (1 << n) - 1
1490 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1492 // decode a jpeg huffman value from the bitstream
1493 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1495 unsigned int temp;
1496 int c,k;
1498 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1500 // look at the top FAST_BITS and determine what symbol ID it is,
1501 // if the code is <= FAST_BITS
1502 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1503 k = h->fast[c];
1504 if (k < 255) {
1505 int s = h->size[k];
1506 if (s > j->code_bits)
1507 return -1;
1508 j->code_buffer <<= s;
1509 j->code_bits -= s;
1510 return h->values[k];
1513 // naive test is to shift the code_buffer down so k bits are
1514 // valid, then test against maxcode. To speed this up, we've
1515 // preshifted maxcode left so that it has (16-k) 0s at the
1516 // end; in other words, regardless of the number of bits, it
1517 // wants to be compared against something shifted to have 16;
1518 // that way we don't need to shift inside the loop.
1519 temp = j->code_buffer >> 16;
1520 for (k=FAST_BITS+1 ; ; ++k)
1521 if (temp < h->maxcode[k])
1522 break;
1523 if (k == 17) {
1524 // error! code not found
1525 j->code_bits -= 16;
1526 return -1;
1529 if (k > j->code_bits)
1530 return -1;
1532 // convert the huffman code to the symbol id
1533 c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1534 STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1536 // convert the id to a symbol
1537 j->code_bits -= k;
1538 j->code_buffer <<= k;
1539 return h->values[c];
1542 // bias[n] = (-1<<n) + 1
1543 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1545 // combined JPEG 'receive' and JPEG 'extend', since baseline
1546 // always extends everything it receives.
1547 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1549 unsigned int k;
1550 int sgn;
1551 if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1553 sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1554 k = stbi_lrot(j->code_buffer, n);
1555 j->code_buffer = k & ~stbi__bmask[n];
1556 k &= stbi__bmask[n];
1557 j->code_bits -= n;
1558 return k + (stbi__jbias[n] & ~sgn);
1561 // get some unsigned bits
1562 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1564 unsigned int k;
1565 if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1566 k = stbi_lrot(j->code_buffer, n);
1567 j->code_buffer = k & ~stbi__bmask[n];
1568 k &= stbi__bmask[n];
1569 j->code_bits -= n;
1570 return k;
1573 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1575 unsigned int k;
1576 if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1577 k = j->code_buffer;
1578 j->code_buffer <<= 1;
1579 --j->code_bits;
1580 return k & 0x80000000;
1583 // given a value that's at position X in the zigzag stream,
1584 // where does it appear in the 8x8 matrix coded as row-major?
1585 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1587 0, 1, 8, 16, 9, 2, 3, 10,
1588 17, 24, 32, 25, 18, 11, 4, 5,
1589 12, 19, 26, 33, 40, 48, 41, 34,
1590 27, 20, 13, 6, 7, 14, 21, 28,
1591 35, 42, 49, 56, 57, 50, 43, 36,
1592 29, 22, 15, 23, 30, 37, 44, 51,
1593 58, 59, 52, 45, 38, 31, 39, 46,
1594 53, 60, 61, 54, 47, 55, 62, 63,
1595 // let corrupt input sample past end
1596 63, 63, 63, 63, 63, 63, 63, 63,
1597 63, 63, 63, 63, 63, 63, 63
1600 // decode one 64-entry block--
1601 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1603 int diff,dc,k;
1604 int t;
1606 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1607 t = stbi__jpeg_huff_decode(j, hdc);
1608 if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1610 // 0 all the ac values now so we can do it 32-bits at a time
1611 memset(data,0,64*sizeof(data[0]));
1613 diff = t ? stbi__extend_receive(j, t) : 0;
1614 dc = j->img_comp[b].dc_pred + diff;
1615 j->img_comp[b].dc_pred = dc;
1616 data[0] = (short) (dc * dequant[0]);
1618 // decode AC components, see JPEG spec
1619 k = 1;
1620 do {
1621 unsigned int zig;
1622 int c,r,s;
1623 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1624 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1625 r = fac[c];
1626 if (r) { // fast-AC path
1627 k += (r >> 4) & 15; // run
1628 s = r & 15; // combined length
1629 j->code_buffer <<= s;
1630 j->code_bits -= s;
1631 // decode into unzigzag'd location
1632 zig = stbi__jpeg_dezigzag[k++];
1633 data[zig] = (short) ((r >> 8) * dequant[zig]);
1634 } else {
1635 int rs = stbi__jpeg_huff_decode(j, hac);
1636 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1637 s = rs & 15;
1638 r = rs >> 4;
1639 if (s == 0) {
1640 if (rs != 0xf0) break; // end block
1641 k += 16;
1642 } else {
1643 k += r;
1644 // decode into unzigzag'd location
1645 zig = stbi__jpeg_dezigzag[k++];
1646 data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1649 } while (k < 64);
1650 return 1;
1653 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1655 int diff,dc;
1656 int t;
1657 if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1659 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1661 if (j->succ_high == 0) {
1662 // first scan for DC coefficient, must be first
1663 memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1664 t = stbi__jpeg_huff_decode(j, hdc);
1665 diff = t ? stbi__extend_receive(j, t) : 0;
1667 dc = j->img_comp[b].dc_pred + diff;
1668 j->img_comp[b].dc_pred = dc;
1669 data[0] = (short) (dc << j->succ_low);
1670 } else {
1671 // refinement scan for DC coefficient
1672 if (stbi__jpeg_get_bit(j))
1673 data[0] += (short) (1 << j->succ_low);
1675 return 1;
1678 // @OPTIMIZE: store non-zigzagged during the decode passes,
1679 // and only de-zigzag when dequantizing
1680 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1682 int k;
1683 if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1685 if (j->succ_high == 0) {
1686 int shift = j->succ_low;
1688 if (j->eob_run) {
1689 --j->eob_run;
1690 return 1;
1693 k = j->spec_start;
1694 do {
1695 unsigned int zig;
1696 int c,r,s;
1697 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1698 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1699 r = fac[c];
1700 if (r) { // fast-AC path
1701 k += (r >> 4) & 15; // run
1702 s = r & 15; // combined length
1703 j->code_buffer <<= s;
1704 j->code_bits -= s;
1705 zig = stbi__jpeg_dezigzag[k++];
1706 data[zig] = (short) ((r >> 8) << shift);
1707 } else {
1708 int rs = stbi__jpeg_huff_decode(j, hac);
1709 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1710 s = rs & 15;
1711 r = rs >> 4;
1712 if (s == 0) {
1713 if (r < 15) {
1714 j->eob_run = (1 << r);
1715 if (r)
1716 j->eob_run += stbi__jpeg_get_bits(j, r);
1717 --j->eob_run;
1718 break;
1720 k += 16;
1721 } else {
1722 k += r;
1723 zig = stbi__jpeg_dezigzag[k++];
1724 data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1727 } while (k <= j->spec_end);
1728 } else {
1729 // refinement scan for these AC coefficients
1731 short bit = (short) (1 << j->succ_low);
1733 if (j->eob_run) {
1734 --j->eob_run;
1735 for (k = j->spec_start; k <= j->spec_end; ++k) {
1736 short *p = &data[stbi__jpeg_dezigzag[k]];
1737 if (*p != 0)
1738 if (stbi__jpeg_get_bit(j))
1739 if ((*p & bit)==0) {
1740 if (*p > 0)
1741 *p += bit;
1742 else
1743 *p -= bit;
1746 } else {
1747 k = j->spec_start;
1748 do {
1749 int r,s;
1750 int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1751 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1752 s = rs & 15;
1753 r = rs >> 4;
1754 if (s == 0) {
1755 if (r < 15) {
1756 j->eob_run = (1 << r) - 1;
1757 if (r)
1758 j->eob_run += stbi__jpeg_get_bits(j, r);
1759 r = 64; // force end of block
1760 } else
1761 r = 16; // r=15 is the code for 16 0s
1762 } else {
1763 if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1764 // sign bit
1765 if (stbi__jpeg_get_bit(j))
1766 s = bit;
1767 else
1768 s = -bit;
1771 // advance by r
1772 while (k <= j->spec_end) {
1773 short *p = &data[stbi__jpeg_dezigzag[k]];
1774 if (*p != 0) {
1775 if (stbi__jpeg_get_bit(j))
1776 if ((*p & bit)==0) {
1777 if (*p > 0)
1778 *p += bit;
1779 else
1780 *p -= bit;
1782 ++k;
1783 } else {
1784 if (r == 0) {
1785 if (s)
1786 data[stbi__jpeg_dezigzag[k++]] = (short) s;
1787 break;
1789 --r;
1790 ++k;
1793 } while (k <= j->spec_end);
1796 return 1;
1799 // take a -128..127 value and stbi__clamp it and convert to 0..255
1800 stbi_inline static stbi_uc stbi__clamp(int x)
1802 // trick to use a single test to catch both cases
1803 if ((unsigned int) x > 255) {
1804 if (x < 0) return 0;
1805 if (x > 255) return 255;
1807 return (stbi_uc) x;
1810 #define stbi__f2f(x) ((int) (((x) * 4096 + 0.5)))
1811 #define stbi__fsh(x) ((x) << 12)
1813 // derived from jidctint -- DCT_ISLOW
1814 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1815 int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1816 p2 = s2; \
1817 p3 = s6; \
1818 p1 = (p2+p3) * stbi__f2f(0.5411961f); \
1819 t2 = p1 + p3*stbi__f2f(-1.847759065f); \
1820 t3 = p1 + p2*stbi__f2f( 0.765366865f); \
1821 p2 = s0; \
1822 p3 = s4; \
1823 t0 = stbi__fsh(p2+p3); \
1824 t1 = stbi__fsh(p2-p3); \
1825 x0 = t0+t3; \
1826 x3 = t0-t3; \
1827 x1 = t1+t2; \
1828 x2 = t1-t2; \
1829 t0 = s7; \
1830 t1 = s5; \
1831 t2 = s3; \
1832 t3 = s1; \
1833 p3 = t0+t2; \
1834 p4 = t1+t3; \
1835 p1 = t0+t3; \
1836 p2 = t1+t2; \
1837 p5 = (p3+p4)*stbi__f2f( 1.175875602f); \
1838 t0 = t0*stbi__f2f( 0.298631336f); \
1839 t1 = t1*stbi__f2f( 2.053119869f); \
1840 t2 = t2*stbi__f2f( 3.072711026f); \
1841 t3 = t3*stbi__f2f( 1.501321110f); \
1842 p1 = p5 + p1*stbi__f2f(-0.899976223f); \
1843 p2 = p5 + p2*stbi__f2f(-2.562915447f); \
1844 p3 = p3*stbi__f2f(-1.961570560f); \
1845 p4 = p4*stbi__f2f(-0.390180644f); \
1846 t3 += p1+p4; \
1847 t2 += p2+p3; \
1848 t1 += p2+p4; \
1849 t0 += p1+p3;
1851 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
1853 int i,val[64],*v=val;
1854 stbi_uc *o;
1855 short *d = data;
1857 // columns
1858 for (i=0; i < 8; ++i,++d, ++v) {
1859 // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
1860 if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
1861 && d[40]==0 && d[48]==0 && d[56]==0) {
1862 // no shortcut 0 seconds
1863 // (1|2|3|4|5|6|7)==0 0 seconds
1864 // all separate -0.047 seconds
1865 // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds
1866 int dcterm = d[0] << 2;
1867 v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
1868 } else {
1869 STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
1870 // constants scaled things up by 1<<12; let's bring them back
1871 // down, but keep 2 extra bits of precision
1872 x0 += 512; x1 += 512; x2 += 512; x3 += 512;
1873 v[ 0] = (x0+t3) >> 10;
1874 v[56] = (x0-t3) >> 10;
1875 v[ 8] = (x1+t2) >> 10;
1876 v[48] = (x1-t2) >> 10;
1877 v[16] = (x2+t1) >> 10;
1878 v[40] = (x2-t1) >> 10;
1879 v[24] = (x3+t0) >> 10;
1880 v[32] = (x3-t0) >> 10;
1884 for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
1885 // no fast case since the first 1D IDCT spread components out
1886 STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
1887 // constants scaled things up by 1<<12, plus we had 1<<2 from first
1888 // loop, plus horizontal and vertical each scale by sqrt(8) so together
1889 // we've got an extra 1<<3, so 1<<17 total we need to remove.
1890 // so we want to round that, which means adding 0.5 * 1<<17,
1891 // aka 65536. Also, we'll end up with -128 to 127 that we want
1892 // to encode as 0..255 by adding 128, so we'll add that before the shift
1893 x0 += 65536 + (128<<17);
1894 x1 += 65536 + (128<<17);
1895 x2 += 65536 + (128<<17);
1896 x3 += 65536 + (128<<17);
1897 // tried computing the shifts into temps, or'ing the temps to see
1898 // if any were out of range, but that was slower
1899 o[0] = stbi__clamp((x0+t3) >> 17);
1900 o[7] = stbi__clamp((x0-t3) >> 17);
1901 o[1] = stbi__clamp((x1+t2) >> 17);
1902 o[6] = stbi__clamp((x1-t2) >> 17);
1903 o[2] = stbi__clamp((x2+t1) >> 17);
1904 o[5] = stbi__clamp((x2-t1) >> 17);
1905 o[3] = stbi__clamp((x3+t0) >> 17);
1906 o[4] = stbi__clamp((x3-t0) >> 17);
1910 #ifdef STBI_SSE2
1911 // sse2 integer IDCT. not the fastest possible implementation but it
1912 // produces bit-identical results to the generic C version so it's
1913 // fully "transparent".
1914 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
1916 // This is constructed to match our regular (generic) integer IDCT exactly.
1917 __m128i row0, row1, row2, row3, row4, row5, row6, row7;
1918 __m128i tmp;
1920 // dot product constant: even elems=x, odd elems=y
1921 #define dct_const(x,y) _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
1923 // out(0) = c0[even]*x + c0[odd]*y (c0, x, y 16-bit, out 32-bit)
1924 // out(1) = c1[even]*x + c1[odd]*y
1925 #define dct_rot(out0,out1, x,y,c0,c1) \
1926 __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
1927 __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
1928 __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
1929 __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
1930 __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
1931 __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
1933 // out = in << 12 (in 16-bit, out 32-bit)
1934 #define dct_widen(out, in) \
1935 __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
1936 __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
1938 // wide add
1939 #define dct_wadd(out, a, b) \
1940 __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
1941 __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
1943 // wide sub
1944 #define dct_wsub(out, a, b) \
1945 __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
1946 __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
1948 // butterfly a/b, add bias, then shift by "s" and pack
1949 #define dct_bfly32o(out0, out1, a,b,bias,s) \
1951 __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
1952 __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
1953 dct_wadd(sum, abiased, b); \
1954 dct_wsub(dif, abiased, b); \
1955 out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
1956 out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
1959 // 8-bit interleave step (for transposes)
1960 #define dct_interleave8(a, b) \
1961 tmp = a; \
1962 a = _mm_unpacklo_epi8(a, b); \
1963 b = _mm_unpackhi_epi8(tmp, b)
1965 // 16-bit interleave step (for transposes)
1966 #define dct_interleave16(a, b) \
1967 tmp = a; \
1968 a = _mm_unpacklo_epi16(a, b); \
1969 b = _mm_unpackhi_epi16(tmp, b)
1971 #define dct_pass(bias,shift) \
1973 /* even part */ \
1974 dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
1975 __m128i sum04 = _mm_add_epi16(row0, row4); \
1976 __m128i dif04 = _mm_sub_epi16(row0, row4); \
1977 dct_widen(t0e, sum04); \
1978 dct_widen(t1e, dif04); \
1979 dct_wadd(x0, t0e, t3e); \
1980 dct_wsub(x3, t0e, t3e); \
1981 dct_wadd(x1, t1e, t2e); \
1982 dct_wsub(x2, t1e, t2e); \
1983 /* odd part */ \
1984 dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
1985 dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
1986 __m128i sum17 = _mm_add_epi16(row1, row7); \
1987 __m128i sum35 = _mm_add_epi16(row3, row5); \
1988 dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
1989 dct_wadd(x4, y0o, y4o); \
1990 dct_wadd(x5, y1o, y5o); \
1991 dct_wadd(x6, y2o, y5o); \
1992 dct_wadd(x7, y3o, y4o); \
1993 dct_bfly32o(row0,row7, x0,x7,bias,shift); \
1994 dct_bfly32o(row1,row6, x1,x6,bias,shift); \
1995 dct_bfly32o(row2,row5, x2,x5,bias,shift); \
1996 dct_bfly32o(row3,row4, x3,x4,bias,shift); \
1999 __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2000 __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2001 __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2002 __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2003 __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2004 __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2005 __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2006 __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2008 // rounding biases in column/row passes, see stbi__idct_block for explanation.
2009 __m128i bias_0 = _mm_set1_epi32(512);
2010 __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2012 // load
2013 row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2014 row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2015 row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2016 row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2017 row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2018 row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2019 row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2020 row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2022 // column pass
2023 dct_pass(bias_0, 10);
2026 // 16bit 8x8 transpose pass 1
2027 dct_interleave16(row0, row4);
2028 dct_interleave16(row1, row5);
2029 dct_interleave16(row2, row6);
2030 dct_interleave16(row3, row7);
2032 // transpose pass 2
2033 dct_interleave16(row0, row2);
2034 dct_interleave16(row1, row3);
2035 dct_interleave16(row4, row6);
2036 dct_interleave16(row5, row7);
2038 // transpose pass 3
2039 dct_interleave16(row0, row1);
2040 dct_interleave16(row2, row3);
2041 dct_interleave16(row4, row5);
2042 dct_interleave16(row6, row7);
2045 // row pass
2046 dct_pass(bias_1, 17);
2049 // pack
2050 __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2051 __m128i p1 = _mm_packus_epi16(row2, row3);
2052 __m128i p2 = _mm_packus_epi16(row4, row5);
2053 __m128i p3 = _mm_packus_epi16(row6, row7);
2055 // 8bit 8x8 transpose pass 1
2056 dct_interleave8(p0, p2); // a0e0a1e1...
2057 dct_interleave8(p1, p3); // c0g0c1g1...
2059 // transpose pass 2
2060 dct_interleave8(p0, p1); // a0c0e0g0...
2061 dct_interleave8(p2, p3); // b0d0f0h0...
2063 // transpose pass 3
2064 dct_interleave8(p0, p2); // a0b0c0d0...
2065 dct_interleave8(p1, p3); // a4b4c4d4...
2067 // store
2068 _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2069 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2070 _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2071 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2072 _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2073 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2074 _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2075 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2078 #undef dct_const
2079 #undef dct_rot
2080 #undef dct_widen
2081 #undef dct_wadd
2082 #undef dct_wsub
2083 #undef dct_bfly32o
2084 #undef dct_interleave8
2085 #undef dct_interleave16
2086 #undef dct_pass
2089 #endif // STBI_SSE2
2091 #ifdef STBI_NEON
2093 // NEON integer IDCT. should produce bit-identical
2094 // results to the generic C version.
2095 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2097 int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2099 int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2100 int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2101 int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2102 int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2103 int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2104 int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2105 int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2106 int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2107 int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2108 int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2109 int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2110 int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2112 #define dct_long_mul(out, inq, coeff) \
2113 int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2114 int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2116 #define dct_long_mac(out, acc, inq, coeff) \
2117 int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2118 int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2120 #define dct_widen(out, inq) \
2121 int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2122 int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2124 // wide add
2125 #define dct_wadd(out, a, b) \
2126 int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2127 int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2129 // wide sub
2130 #define dct_wsub(out, a, b) \
2131 int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2132 int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2134 // butterfly a/b, then shift using "shiftop" by "s" and pack
2135 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2137 dct_wadd(sum, a, b); \
2138 dct_wsub(dif, a, b); \
2139 out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2140 out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2143 #define dct_pass(shiftop, shift) \
2145 /* even part */ \
2146 int16x8_t sum26 = vaddq_s16(row2, row6); \
2147 dct_long_mul(p1e, sum26, rot0_0); \
2148 dct_long_mac(t2e, p1e, row6, rot0_1); \
2149 dct_long_mac(t3e, p1e, row2, rot0_2); \
2150 int16x8_t sum04 = vaddq_s16(row0, row4); \
2151 int16x8_t dif04 = vsubq_s16(row0, row4); \
2152 dct_widen(t0e, sum04); \
2153 dct_widen(t1e, dif04); \
2154 dct_wadd(x0, t0e, t3e); \
2155 dct_wsub(x3, t0e, t3e); \
2156 dct_wadd(x1, t1e, t2e); \
2157 dct_wsub(x2, t1e, t2e); \
2158 /* odd part */ \
2159 int16x8_t sum15 = vaddq_s16(row1, row5); \
2160 int16x8_t sum17 = vaddq_s16(row1, row7); \
2161 int16x8_t sum35 = vaddq_s16(row3, row5); \
2162 int16x8_t sum37 = vaddq_s16(row3, row7); \
2163 int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2164 dct_long_mul(p5o, sumodd, rot1_0); \
2165 dct_long_mac(p1o, p5o, sum17, rot1_1); \
2166 dct_long_mac(p2o, p5o, sum35, rot1_2); \
2167 dct_long_mul(p3o, sum37, rot2_0); \
2168 dct_long_mul(p4o, sum15, rot2_1); \
2169 dct_wadd(sump13o, p1o, p3o); \
2170 dct_wadd(sump24o, p2o, p4o); \
2171 dct_wadd(sump23o, p2o, p3o); \
2172 dct_wadd(sump14o, p1o, p4o); \
2173 dct_long_mac(x4, sump13o, row7, rot3_0); \
2174 dct_long_mac(x5, sump24o, row5, rot3_1); \
2175 dct_long_mac(x6, sump23o, row3, rot3_2); \
2176 dct_long_mac(x7, sump14o, row1, rot3_3); \
2177 dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2178 dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2179 dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2180 dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2183 // load
2184 row0 = vld1q_s16(data + 0*8);
2185 row1 = vld1q_s16(data + 1*8);
2186 row2 = vld1q_s16(data + 2*8);
2187 row3 = vld1q_s16(data + 3*8);
2188 row4 = vld1q_s16(data + 4*8);
2189 row5 = vld1q_s16(data + 5*8);
2190 row6 = vld1q_s16(data + 6*8);
2191 row7 = vld1q_s16(data + 7*8);
2193 // add DC bias
2194 row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2196 // column pass
2197 dct_pass(vrshrn_n_s32, 10);
2199 // 16bit 8x8 transpose
2201 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2202 // whether compilers actually get this is another story, sadly.
2203 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2204 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2205 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2207 // pass 1
2208 dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2209 dct_trn16(row2, row3);
2210 dct_trn16(row4, row5);
2211 dct_trn16(row6, row7);
2213 // pass 2
2214 dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2215 dct_trn32(row1, row3);
2216 dct_trn32(row4, row6);
2217 dct_trn32(row5, row7);
2219 // pass 3
2220 dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2221 dct_trn64(row1, row5);
2222 dct_trn64(row2, row6);
2223 dct_trn64(row3, row7);
2225 #undef dct_trn16
2226 #undef dct_trn32
2227 #undef dct_trn64
2230 // row pass
2231 // vrshrn_n_s32 only supports shifts up to 16, we need
2232 // 17. so do a non-rounding shift of 16 first then follow
2233 // up with a rounding shift by 1.
2234 dct_pass(vshrn_n_s32, 16);
2237 // pack and round
2238 uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2239 uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2240 uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2241 uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2242 uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2243 uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2244 uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2245 uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2247 // again, these can translate into one instruction, but often don't.
2248 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2249 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2250 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2252 // sadly can't use interleaved stores here since we only write
2253 // 8 bytes to each scan line!
2255 // 8x8 8-bit transpose pass 1
2256 dct_trn8_8(p0, p1);
2257 dct_trn8_8(p2, p3);
2258 dct_trn8_8(p4, p5);
2259 dct_trn8_8(p6, p7);
2261 // pass 2
2262 dct_trn8_16(p0, p2);
2263 dct_trn8_16(p1, p3);
2264 dct_trn8_16(p4, p6);
2265 dct_trn8_16(p5, p7);
2267 // pass 3
2268 dct_trn8_32(p0, p4);
2269 dct_trn8_32(p1, p5);
2270 dct_trn8_32(p2, p6);
2271 dct_trn8_32(p3, p7);
2273 // store
2274 vst1_u8(out, p0); out += out_stride;
2275 vst1_u8(out, p1); out += out_stride;
2276 vst1_u8(out, p2); out += out_stride;
2277 vst1_u8(out, p3); out += out_stride;
2278 vst1_u8(out, p4); out += out_stride;
2279 vst1_u8(out, p5); out += out_stride;
2280 vst1_u8(out, p6); out += out_stride;
2281 vst1_u8(out, p7);
2283 #undef dct_trn8_8
2284 #undef dct_trn8_16
2285 #undef dct_trn8_32
2288 #undef dct_long_mul
2289 #undef dct_long_mac
2290 #undef dct_widen
2291 #undef dct_wadd
2292 #undef dct_wsub
2293 #undef dct_bfly32o
2294 #undef dct_pass
2297 #endif // STBI_NEON
2299 #define STBI__MARKER_none 0xff
2300 // if there's a pending marker from the entropy stream, return that
2301 // otherwise, fetch from the stream and get a marker. if there's no
2302 // marker, return 0xff, which is never a valid marker value
2303 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2305 stbi_uc x;
2306 if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2307 x = stbi__get8(j->s);
2308 if (x != 0xff) return STBI__MARKER_none;
2309 while (x == 0xff)
2310 x = stbi__get8(j->s);
2311 return x;
2314 // in each scan, we'll have scan_n components, and the order
2315 // of the components is specified by order[]
2316 #define STBI__RESTART(x) ((x) >= 0xd0 && (x) <= 0xd7)
2318 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2319 // the dc prediction
2320 static void stbi__jpeg_reset(stbi__jpeg *j)
2322 j->code_bits = 0;
2323 j->code_buffer = 0;
2324 j->nomore = 0;
2325 j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2326 j->marker = STBI__MARKER_none;
2327 j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2328 j->eob_run = 0;
2329 // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2330 // since we don't even allow 1<<30 pixels
2333 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2335 stbi__jpeg_reset(z);
2336 if (!z->progressive) {
2337 if (z->scan_n == 1) {
2338 int i,j;
2339 STBI_SIMD_ALIGN(short, data[64]);
2340 int n = z->order[0];
2341 // non-interleaved data, we just need to process one block at a time,
2342 // in trivial scanline order
2343 // number of blocks to do just depends on how many actual "pixels" this
2344 // component has, independent of interleaved MCU blocking and such
2345 int w = (z->img_comp[n].x+7) >> 3;
2346 int h = (z->img_comp[n].y+7) >> 3;
2347 for (j=0; j < h; ++j) {
2348 for (i=0; i < w; ++i) {
2349 int ha = z->img_comp[n].ha;
2350 if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2351 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2352 // every data block is an MCU, so countdown the restart interval
2353 if (--z->todo <= 0) {
2354 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2355 // if it's NOT a restart, then just bail, so we get corrupt data
2356 // rather than no data
2357 if (!STBI__RESTART(z->marker)) return 1;
2358 stbi__jpeg_reset(z);
2362 return 1;
2363 } else { // interleaved
2364 int i,j,k,x,y;
2365 STBI_SIMD_ALIGN(short, data[64]);
2366 for (j=0; j < z->img_mcu_y; ++j) {
2367 for (i=0; i < z->img_mcu_x; ++i) {
2368 // scan an interleaved mcu... process scan_n components in order
2369 for (k=0; k < z->scan_n; ++k) {
2370 int n = z->order[k];
2371 // scan out an mcu's worth of this component; that's just determined
2372 // by the basic H and V specified for the component
2373 for (y=0; y < z->img_comp[n].v; ++y) {
2374 for (x=0; x < z->img_comp[n].h; ++x) {
2375 int x2 = (i*z->img_comp[n].h + x)*8;
2376 int y2 = (j*z->img_comp[n].v + y)*8;
2377 int ha = z->img_comp[n].ha;
2378 if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2379 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2383 // after all interleaved components, that's an interleaved MCU,
2384 // so now count down the restart interval
2385 if (--z->todo <= 0) {
2386 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2387 if (!STBI__RESTART(z->marker)) return 1;
2388 stbi__jpeg_reset(z);
2392 return 1;
2394 } else {
2395 if (z->scan_n == 1) {
2396 int i,j;
2397 int n = z->order[0];
2398 // non-interleaved data, we just need to process one block at a time,
2399 // in trivial scanline order
2400 // number of blocks to do just depends on how many actual "pixels" this
2401 // component has, independent of interleaved MCU blocking and such
2402 int w = (z->img_comp[n].x+7) >> 3;
2403 int h = (z->img_comp[n].y+7) >> 3;
2404 for (j=0; j < h; ++j) {
2405 for (i=0; i < w; ++i) {
2406 short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2407 if (z->spec_start == 0) {
2408 if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2409 return 0;
2410 } else {
2411 int ha = z->img_comp[n].ha;
2412 if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2413 return 0;
2415 // every data block is an MCU, so countdown the restart interval
2416 if (--z->todo <= 0) {
2417 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2418 if (!STBI__RESTART(z->marker)) return 1;
2419 stbi__jpeg_reset(z);
2423 return 1;
2424 } else { // interleaved
2425 int i,j,k,x,y;
2426 for (j=0; j < z->img_mcu_y; ++j) {
2427 for (i=0; i < z->img_mcu_x; ++i) {
2428 // scan an interleaved mcu... process scan_n components in order
2429 for (k=0; k < z->scan_n; ++k) {
2430 int n = z->order[k];
2431 // scan out an mcu's worth of this component; that's just determined
2432 // by the basic H and V specified for the component
2433 for (y=0; y < z->img_comp[n].v; ++y) {
2434 for (x=0; x < z->img_comp[n].h; ++x) {
2435 int x2 = (i*z->img_comp[n].h + x);
2436 int y2 = (j*z->img_comp[n].v + y);
2437 short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2438 if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2439 return 0;
2443 // after all interleaved components, that's an interleaved MCU,
2444 // so now count down the restart interval
2445 if (--z->todo <= 0) {
2446 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2447 if (!STBI__RESTART(z->marker)) return 1;
2448 stbi__jpeg_reset(z);
2452 return 1;
2457 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2459 int i;
2460 for (i=0; i < 64; ++i)
2461 data[i] *= dequant[i];
2464 static void stbi__jpeg_finish(stbi__jpeg *z)
2466 if (z->progressive) {
2467 // dequantize and idct the data
2468 int i,j,n;
2469 for (n=0; n < z->s->img_n; ++n) {
2470 int w = (z->img_comp[n].x+7) >> 3;
2471 int h = (z->img_comp[n].y+7) >> 3;
2472 for (j=0; j < h; ++j) {
2473 for (i=0; i < w; ++i) {
2474 short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2475 stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2476 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2483 static int stbi__process_marker(stbi__jpeg *z, int m)
2485 int L;
2486 switch (m) {
2487 case STBI__MARKER_none: // no marker found
2488 return stbi__err("expected marker","Corrupt JPEG");
2490 case 0xDD: // DRI - specify restart interval
2491 if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2492 z->restart_interval = stbi__get16be(z->s);
2493 return 1;
2495 case 0xDB: // DQT - define quantization table
2496 L = stbi__get16be(z->s)-2;
2497 while (L > 0) {
2498 int q = stbi__get8(z->s);
2499 int p = q >> 4;
2500 int t = q & 15,i;
2501 if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2502 if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2503 for (i=0; i < 64; ++i)
2504 z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2505 L -= 65;
2507 return L==0;
2509 case 0xC4: // DHT - define huffman table
2510 L = stbi__get16be(z->s)-2;
2511 while (L > 0) {
2512 stbi_uc *v;
2513 int sizes[16],i,n=0;
2514 int q = stbi__get8(z->s);
2515 int tc = q >> 4;
2516 int th = q & 15;
2517 if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2518 for (i=0; i < 16; ++i) {
2519 sizes[i] = stbi__get8(z->s);
2520 n += sizes[i];
2522 L -= 17;
2523 if (tc == 0) {
2524 if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2525 v = z->huff_dc[th].values;
2526 } else {
2527 if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2528 v = z->huff_ac[th].values;
2530 for (i=0; i < n; ++i)
2531 v[i] = stbi__get8(z->s);
2532 if (tc != 0)
2533 stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2534 L -= n;
2536 return L==0;
2538 // check for comment block or APP blocks
2539 if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2540 stbi__skip(z->s, stbi__get16be(z->s)-2);
2541 return 1;
2543 return 0;
2546 // after we see SOS
2547 static int stbi__process_scan_header(stbi__jpeg *z)
2549 int i;
2550 int Ls = stbi__get16be(z->s);
2551 z->scan_n = stbi__get8(z->s);
2552 if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2553 if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2554 for (i=0; i < z->scan_n; ++i) {
2555 int id = stbi__get8(z->s), which;
2556 int q = stbi__get8(z->s);
2557 for (which = 0; which < z->s->img_n; ++which)
2558 if (z->img_comp[which].id == id)
2559 break;
2560 if (which == z->s->img_n) return 0; // no match
2561 z->img_comp[which].hd = q >> 4; if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2562 z->img_comp[which].ha = q & 15; if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2563 z->order[i] = which;
2567 int aa;
2568 z->spec_start = stbi__get8(z->s);
2569 z->spec_end = stbi__get8(z->s); // should be 63, but might be 0
2570 aa = stbi__get8(z->s);
2571 z->succ_high = (aa >> 4);
2572 z->succ_low = (aa & 15);
2573 if (z->progressive) {
2574 if (z->spec_start > 63 || z->spec_end > 63 || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2575 return stbi__err("bad SOS", "Corrupt JPEG");
2576 } else {
2577 if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2578 if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2579 z->spec_end = 63;
2583 return 1;
2586 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2588 stbi__context *s = z->s;
2589 int Lf,p,i,q, h_max=1,v_max=1,c;
2590 Lf = stbi__get16be(s); if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2591 p = stbi__get8(s); if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2592 s->img_y = stbi__get16be(s); if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2593 s->img_x = stbi__get16be(s); if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2594 c = stbi__get8(s);
2595 if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG"); // JFIF requires
2596 s->img_n = c;
2597 for (i=0; i < c; ++i) {
2598 z->img_comp[i].data = NULL;
2599 z->img_comp[i].linebuf = NULL;
2602 if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2604 for (i=0; i < s->img_n; ++i) {
2605 z->img_comp[i].id = stbi__get8(s);
2606 if (z->img_comp[i].id != i+1) // JFIF requires
2607 if (z->img_comp[i].id != i) // some version of jpegtran outputs non-JFIF-compliant files!
2608 return stbi__err("bad component ID","Corrupt JPEG");
2609 q = stbi__get8(s);
2610 z->img_comp[i].h = (q >> 4); if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2611 z->img_comp[i].v = q & 15; if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2612 z->img_comp[i].tq = stbi__get8(s); if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2615 if (scan != STBI__SCAN_load) return 1;
2617 if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2619 for (i=0; i < s->img_n; ++i) {
2620 if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2621 if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2624 // compute interleaved mcu info
2625 z->img_h_max = h_max;
2626 z->img_v_max = v_max;
2627 z->img_mcu_w = h_max * 8;
2628 z->img_mcu_h = v_max * 8;
2629 z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2630 z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2632 for (i=0; i < s->img_n; ++i) {
2633 // number of effective pixels (e.g. for non-interleaved MCU)
2634 z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2635 z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2636 // to simplify generation, we'll allocate enough memory to decode
2637 // the bogus oversized data from using interleaved MCUs and their
2638 // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2639 // discard the extra data until colorspace conversion
2640 z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2641 z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2642 z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2644 if (z->img_comp[i].raw_data == NULL) {
2645 for(--i; i >= 0; --i) {
2646 STBI_FREE(z->img_comp[i].raw_data);
2647 z->img_comp[i].data = NULL;
2649 return stbi__err("outofmem", "Out of memory");
2651 // align blocks for idct using mmx/sse
2652 z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2653 z->img_comp[i].linebuf = NULL;
2654 if (z->progressive) {
2655 z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2656 z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2657 z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2658 z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2659 } else {
2660 z->img_comp[i].coeff = 0;
2661 z->img_comp[i].raw_coeff = 0;
2665 return 1;
2668 // use comparisons since in some cases we handle more than one case (e.g. SOF)
2669 #define stbi__DNL(x) ((x) == 0xdc)
2670 #define stbi__SOI(x) ((x) == 0xd8)
2671 #define stbi__EOI(x) ((x) == 0xd9)
2672 #define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2673 #define stbi__SOS(x) ((x) == 0xda)
2675 #define stbi__SOF_progressive(x) ((x) == 0xc2)
2677 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2679 int m;
2680 z->marker = STBI__MARKER_none; // initialize cached marker to empty
2681 m = stbi__get_marker(z);
2682 if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2683 if (scan == STBI__SCAN_type) return 1;
2684 m = stbi__get_marker(z);
2685 while (!stbi__SOF(m)) {
2686 if (!stbi__process_marker(z,m)) return 0;
2687 m = stbi__get_marker(z);
2688 while (m == STBI__MARKER_none) {
2689 // some files have extra padding after their blocks, so ok, we'll scan
2690 if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2691 m = stbi__get_marker(z);
2694 z->progressive = stbi__SOF_progressive(m);
2695 if (!stbi__process_frame_header(z, scan)) return 0;
2696 return 1;
2699 // decode image to YCbCr format
2700 static int stbi__decode_jpeg_image(stbi__jpeg *j)
2702 int m;
2703 j->restart_interval = 0;
2704 if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2705 m = stbi__get_marker(j);
2706 while (!stbi__EOI(m)) {
2707 if (stbi__SOS(m)) {
2708 if (!stbi__process_scan_header(j)) return 0;
2709 if (!stbi__parse_entropy_coded_data(j)) return 0;
2710 if (j->marker == STBI__MARKER_none ) {
2711 // handle 0s at the end of image data from IP Kamera 9060
2712 while (!stbi__at_eof(j->s)) {
2713 int x = stbi__get8(j->s);
2714 if (x == 255) {
2715 j->marker = stbi__get8(j->s);
2716 break;
2717 } else if (x != 0) {
2718 return stbi__err("junk before marker", "Corrupt JPEG");
2721 // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2723 } else {
2724 if (!stbi__process_marker(j, m)) return 0;
2726 m = stbi__get_marker(j);
2728 if (j->progressive)
2729 stbi__jpeg_finish(j);
2730 return 1;
2733 // static jfif-centered resampling (across block boundaries)
2735 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2736 int w, int hs);
2738 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2740 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2742 STBI_NOTUSED(out);
2743 STBI_NOTUSED(in_far);
2744 STBI_NOTUSED(w);
2745 STBI_NOTUSED(hs);
2746 return in_near;
2749 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2751 // need to generate two samples vertically for every one in input
2752 int i;
2753 STBI_NOTUSED(hs);
2754 for (i=0; i < w; ++i)
2755 out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2756 return out;
2759 static stbi_uc* stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2761 // need to generate two samples horizontally for every one in input
2762 int i;
2763 stbi_uc *input = in_near;
2765 if (w == 1) {
2766 // if only one sample, can't do any interpolation
2767 out[0] = out[1] = input[0];
2768 return out;
2771 out[0] = input[0];
2772 out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2773 for (i=1; i < w-1; ++i) {
2774 int n = 3*input[i]+2;
2775 out[i*2+0] = stbi__div4(n+input[i-1]);
2776 out[i*2+1] = stbi__div4(n+input[i+1]);
2778 out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2779 out[i*2+1] = input[w-1];
2781 STBI_NOTUSED(in_far);
2782 STBI_NOTUSED(hs);
2784 return out;
2787 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2789 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2791 // need to generate 2x2 samples for every one in input
2792 int i,t0,t1;
2793 if (w == 1) {
2794 out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2795 return out;
2798 t1 = 3*in_near[0] + in_far[0];
2799 out[0] = stbi__div4(t1+2);
2800 for (i=1; i < w; ++i) {
2801 t0 = t1;
2802 t1 = 3*in_near[i]+in_far[i];
2803 out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2804 out[i*2 ] = stbi__div16(3*t1 + t0 + 8);
2806 out[w*2-1] = stbi__div4(t1+2);
2808 STBI_NOTUSED(hs);
2810 return out;
2813 #if defined(STBI_SSE2) || defined(STBI_NEON)
2814 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2816 // need to generate 2x2 samples for every one in input
2817 int i=0,t0,t1;
2819 if (w == 1) {
2820 out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2821 return out;
2824 t1 = 3*in_near[0] + in_far[0];
2825 // process groups of 8 pixels for as long as we can.
2826 // note we can't handle the last pixel in a row in this loop
2827 // because we need to handle the filter boundary conditions.
2828 for (; i < ((w-1) & ~7); i += 8) {
2829 #if defined(STBI_SSE2)
2830 // load and perform the vertical filtering pass
2831 // this uses 3*x + y = 4*x + (y - x)
2832 __m128i zero = _mm_setzero_si128();
2833 __m128i farb = _mm_loadl_epi64((__m128i *) (in_far + i));
2834 __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2835 __m128i farw = _mm_unpacklo_epi8(farb, zero);
2836 __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
2837 __m128i diff = _mm_sub_epi16(farw, nearw);
2838 __m128i nears = _mm_slli_epi16(nearw, 2);
2839 __m128i curr = _mm_add_epi16(nears, diff); // current row
2841 // horizontal filter works the same based on shifted vers of current
2842 // row. "prev" is current row shifted right by 1 pixel; we need to
2843 // insert the previous pixel value (from t1).
2844 // "next" is current row shifted left by 1 pixel, with first pixel
2845 // of next block of 8 pixels added in.
2846 __m128i prv0 = _mm_slli_si128(curr, 2);
2847 __m128i nxt0 = _mm_srli_si128(curr, 2);
2848 __m128i prev = _mm_insert_epi16(prv0, t1, 0);
2849 __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
2851 // horizontal filter, polyphase implementation since it's convenient:
2852 // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2853 // odd pixels = 3*cur + next = cur*4 + (next - cur)
2854 // note the shared term.
2855 __m128i bias = _mm_set1_epi16(8);
2856 __m128i curs = _mm_slli_epi16(curr, 2);
2857 __m128i prvd = _mm_sub_epi16(prev, curr);
2858 __m128i nxtd = _mm_sub_epi16(next, curr);
2859 __m128i curb = _mm_add_epi16(curs, bias);
2860 __m128i even = _mm_add_epi16(prvd, curb);
2861 __m128i odd = _mm_add_epi16(nxtd, curb);
2863 // interleave even and odd pixels, then undo scaling.
2864 __m128i int0 = _mm_unpacklo_epi16(even, odd);
2865 __m128i int1 = _mm_unpackhi_epi16(even, odd);
2866 __m128i de0 = _mm_srli_epi16(int0, 4);
2867 __m128i de1 = _mm_srli_epi16(int1, 4);
2869 // pack and write output
2870 __m128i outv = _mm_packus_epi16(de0, de1);
2871 _mm_storeu_si128((__m128i *) (out + i*2), outv);
2872 #elif defined(STBI_NEON)
2873 // load and perform the vertical filtering pass
2874 // this uses 3*x + y = 4*x + (y - x)
2875 uint8x8_t farb = vld1_u8(in_far + i);
2876 uint8x8_t nearb = vld1_u8(in_near + i);
2877 int16x8_t diff = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
2878 int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
2879 int16x8_t curr = vaddq_s16(nears, diff); // current row
2881 // horizontal filter works the same based on shifted vers of current
2882 // row. "prev" is current row shifted right by 1 pixel; we need to
2883 // insert the previous pixel value (from t1).
2884 // "next" is current row shifted left by 1 pixel, with first pixel
2885 // of next block of 8 pixels added in.
2886 int16x8_t prv0 = vextq_s16(curr, curr, 7);
2887 int16x8_t nxt0 = vextq_s16(curr, curr, 1);
2888 int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
2889 int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
2891 // horizontal filter, polyphase implementation since it's convenient:
2892 // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2893 // odd pixels = 3*cur + next = cur*4 + (next - cur)
2894 // note the shared term.
2895 int16x8_t curs = vshlq_n_s16(curr, 2);
2896 int16x8_t prvd = vsubq_s16(prev, curr);
2897 int16x8_t nxtd = vsubq_s16(next, curr);
2898 int16x8_t even = vaddq_s16(curs, prvd);
2899 int16x8_t odd = vaddq_s16(curs, nxtd);
2901 // undo scaling and round, then store with even/odd phases interleaved
2902 uint8x8x2_t o;
2903 o.val[0] = vqrshrun_n_s16(even, 4);
2904 o.val[1] = vqrshrun_n_s16(odd, 4);
2905 vst2_u8(out + i*2, o);
2906 #endif
2908 // "previous" value for next iter
2909 t1 = 3*in_near[i+7] + in_far[i+7];
2912 t0 = t1;
2913 t1 = 3*in_near[i] + in_far[i];
2914 out[i*2] = stbi__div16(3*t1 + t0 + 8);
2916 for (++i; i < w; ++i) {
2917 t0 = t1;
2918 t1 = 3*in_near[i]+in_far[i];
2919 out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2920 out[i*2 ] = stbi__div16(3*t1 + t0 + 8);
2922 out[w*2-1] = stbi__div4(t1+2);
2924 STBI_NOTUSED(hs);
2926 return out;
2928 #endif
2930 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2932 // resample with nearest-neighbor
2933 int i,j;
2934 STBI_NOTUSED(in_far);
2935 for (i=0; i < w; ++i)
2936 for (j=0; j < hs; ++j)
2937 out[i*hs+j] = in_near[i];
2938 return out;
2941 #ifdef STBI_JPEG_OLD
2942 // this is the same YCbCr-to-RGB calculation that stb_image has used
2943 // historically before the algorithm changes in 1.49
2944 #define float2fixed(x) ((int) ((x) * 65536 + 0.5))
2945 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
2947 int i;
2948 for (i=0; i < count; ++i) {
2949 int y_fixed = (y[i] << 16) + 32768; // rounding
2950 int r,g,b;
2951 int cr = pcr[i] - 128;
2952 int cb = pcb[i] - 128;
2953 r = y_fixed + cr*float2fixed(1.40200f);
2954 g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
2955 b = y_fixed + cb*float2fixed(1.77200f);
2956 r >>= 16;
2957 g >>= 16;
2958 b >>= 16;
2959 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
2960 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
2961 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
2962 out[0] = (stbi_uc)r;
2963 out[1] = (stbi_uc)g;
2964 out[2] = (stbi_uc)b;
2965 out[3] = 255;
2966 out += step;
2969 #else
2970 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
2971 // to make sure the code produces the same results in both SIMD and scalar
2972 #define float2fixed(x) (((int) ((x) * 4096.0f + 0.5f)) << 8)
2973 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
2975 int i;
2976 for (i=0; i < count; ++i) {
2977 int y_fixed = (y[i] << 20) + (1<<19); // rounding
2978 int r,g,b;
2979 int cr = pcr[i] - 128;
2980 int cb = pcb[i] - 128;
2981 r = y_fixed + cr* float2fixed(1.40200f);
2982 g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
2983 b = y_fixed + cb* float2fixed(1.77200f);
2984 r >>= 20;
2985 g >>= 20;
2986 b >>= 20;
2987 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
2988 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
2989 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
2990 out[0] = (stbi_uc)r;
2991 out[1] = (stbi_uc)g;
2992 out[2] = (stbi_uc)b;
2993 out[3] = 255;
2994 out += step;
2997 #endif
2999 #if defined(STBI_SSE2) || defined(STBI_NEON)
3000 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3002 int i = 0;
3004 #ifdef STBI_SSE2
3005 // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3006 // it's useful in practice (you wouldn't use it for textures, for example).
3007 // so just accelerate step == 4 case.
3008 if (step == 4) {
3009 // this is a fairly straightforward implementation and not super-optimized.
3010 __m128i signflip = _mm_set1_epi8(-0x80);
3011 __m128i cr_const0 = _mm_set1_epi16( (short) ( 1.40200f*4096.0f+0.5f));
3012 __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3013 __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3014 __m128i cb_const1 = _mm_set1_epi16( (short) ( 1.77200f*4096.0f+0.5f));
3015 __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3016 __m128i xw = _mm_set1_epi16(255); // alpha channel
3018 for (; i+7 < count; i += 8) {
3019 // load
3020 __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3021 __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3022 __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3023 __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3024 __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3026 // unpack to short (and left-shift cr, cb by 8)
3027 __m128i yw = _mm_unpacklo_epi8(y_bias, y_bytes);
3028 __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3029 __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3031 // color transform
3032 __m128i yws = _mm_srli_epi16(yw, 4);
3033 __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3034 __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3035 __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3036 __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3037 __m128i rws = _mm_add_epi16(cr0, yws);
3038 __m128i gwt = _mm_add_epi16(cb0, yws);
3039 __m128i bws = _mm_add_epi16(yws, cb1);
3040 __m128i gws = _mm_add_epi16(gwt, cr1);
3042 // descale
3043 __m128i rw = _mm_srai_epi16(rws, 4);
3044 __m128i bw = _mm_srai_epi16(bws, 4);
3045 __m128i gw = _mm_srai_epi16(gws, 4);
3047 // back to byte, set up for transpose
3048 __m128i brb = _mm_packus_epi16(rw, bw);
3049 __m128i gxb = _mm_packus_epi16(gw, xw);
3051 // transpose to interleave channels
3052 __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3053 __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3054 __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3055 __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3057 // store
3058 _mm_storeu_si128((__m128i *) (out + 0), o0);
3059 _mm_storeu_si128((__m128i *) (out + 16), o1);
3060 out += 32;
3063 #endif
3065 #ifdef STBI_NEON
3066 // in this version, step=3 support would be easy to add. but is there demand?
3067 if (step == 4) {
3068 // this is a fairly straightforward implementation and not super-optimized.
3069 uint8x8_t signflip = vdup_n_u8(0x80);
3070 int16x8_t cr_const0 = vdupq_n_s16( (short) ( 1.40200f*4096.0f+0.5f));
3071 int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3072 int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3073 int16x8_t cb_const1 = vdupq_n_s16( (short) ( 1.77200f*4096.0f+0.5f));
3075 for (; i+7 < count; i += 8) {
3076 // load
3077 uint8x8_t y_bytes = vld1_u8(y + i);
3078 uint8x8_t cr_bytes = vld1_u8(pcr + i);
3079 uint8x8_t cb_bytes = vld1_u8(pcb + i);
3080 int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3081 int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3083 // expand to s16
3084 int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3085 int16x8_t crw = vshll_n_s8(cr_biased, 7);
3086 int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3088 // color transform
3089 int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3090 int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3091 int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3092 int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3093 int16x8_t rws = vaddq_s16(yws, cr0);
3094 int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3095 int16x8_t bws = vaddq_s16(yws, cb1);
3097 // undo scaling, round, convert to byte
3098 uint8x8x4_t o;
3099 o.val[0] = vqrshrun_n_s16(rws, 4);
3100 o.val[1] = vqrshrun_n_s16(gws, 4);
3101 o.val[2] = vqrshrun_n_s16(bws, 4);
3102 o.val[3] = vdup_n_u8(255);
3104 // store, interleaving r/g/b/a
3105 vst4_u8(out, o);
3106 out += 8*4;
3109 #endif
3111 for (; i < count; ++i) {
3112 int y_fixed = (y[i] << 20) + (1<<19); // rounding
3113 int r,g,b;
3114 int cr = pcr[i] - 128;
3115 int cb = pcb[i] - 128;
3116 r = y_fixed + cr* float2fixed(1.40200f);
3117 g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3118 b = y_fixed + cb* float2fixed(1.77200f);
3119 r >>= 20;
3120 g >>= 20;
3121 b >>= 20;
3122 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3123 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3124 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3125 out[0] = (stbi_uc)r;
3126 out[1] = (stbi_uc)g;
3127 out[2] = (stbi_uc)b;
3128 out[3] = 255;
3129 out += step;
3132 #endif
3134 // set up the kernels
3135 static void stbi__setup_jpeg(stbi__jpeg *j)
3137 j->idct_block_kernel = stbi__idct_block;
3138 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3139 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3141 #ifdef STBI_SSE2
3142 if (stbi__sse2_available()) {
3143 j->idct_block_kernel = stbi__idct_simd;
3144 #ifndef STBI_JPEG_OLD
3145 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3146 #endif
3147 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3149 #endif
3151 #ifdef STBI_NEON
3152 j->idct_block_kernel = stbi__idct_simd;
3153 #ifndef STBI_JPEG_OLD
3154 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3155 #endif
3156 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3157 #endif
3160 // clean up the temporary component buffers
3161 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3163 int i;
3164 for (i=0; i < j->s->img_n; ++i) {
3165 if (j->img_comp[i].raw_data) {
3166 STBI_FREE(j->img_comp[i].raw_data);
3167 j->img_comp[i].raw_data = NULL;
3168 j->img_comp[i].data = NULL;
3170 if (j->img_comp[i].raw_coeff) {
3171 STBI_FREE(j->img_comp[i].raw_coeff);
3172 j->img_comp[i].raw_coeff = 0;
3173 j->img_comp[i].coeff = 0;
3175 if (j->img_comp[i].linebuf) {
3176 STBI_FREE(j->img_comp[i].linebuf);
3177 j->img_comp[i].linebuf = NULL;
3182 typedef struct
3184 resample_row_func resample;
3185 stbi_uc *line0,*line1;
3186 int hs,vs; // expansion factor in each axis
3187 int w_lores; // horizontal pixels pre-expansion
3188 int ystep; // how far through vertical expansion we are
3189 int ypos; // which pre-expansion row we're on
3190 } stbi__resample;
3192 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3194 int n, decode_n;
3195 z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3197 // validate req_comp
3198 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3200 // load a jpeg image from whichever source, but leave in YCbCr format
3201 if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3203 // determine actual number of components to generate
3204 n = req_comp ? req_comp : z->s->img_n;
3206 if (z->s->img_n == 3 && n < 3)
3207 decode_n = 1;
3208 else
3209 decode_n = z->s->img_n;
3211 // resample and color-convert
3213 int k;
3214 unsigned int i,j;
3215 stbi_uc *output;
3216 stbi_uc *coutput[4];
3218 stbi__resample res_comp[4];
3220 for (k=0; k < decode_n; ++k) {
3221 stbi__resample *r = &res_comp[k];
3223 // allocate line buffer big enough for upsampling off the edges
3224 // with upsample factor of 4
3225 z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3226 if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3228 r->hs = z->img_h_max / z->img_comp[k].h;
3229 r->vs = z->img_v_max / z->img_comp[k].v;
3230 r->ystep = r->vs >> 1;
3231 r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3232 r->ypos = 0;
3233 r->line0 = r->line1 = z->img_comp[k].data;
3235 if (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3236 else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3237 else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3238 else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3239 else r->resample = stbi__resample_row_generic;
3242 // can't error after this so, this is safe
3243 output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3244 if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3246 // now go ahead and resample
3247 for (j=0; j < z->s->img_y; ++j) {
3248 stbi_uc *out = output + n * z->s->img_x * j;
3249 for (k=0; k < decode_n; ++k) {
3250 stbi__resample *r = &res_comp[k];
3251 int y_bot = r->ystep >= (r->vs >> 1);
3252 coutput[k] = r->resample(z->img_comp[k].linebuf,
3253 y_bot ? r->line1 : r->line0,
3254 y_bot ? r->line0 : r->line1,
3255 r->w_lores, r->hs);
3256 if (++r->ystep >= r->vs) {
3257 r->ystep = 0;
3258 r->line0 = r->line1;
3259 if (++r->ypos < z->img_comp[k].y)
3260 r->line1 += z->img_comp[k].w2;
3263 if (n >= 3) {
3264 stbi_uc *y = coutput[0];
3265 if (z->s->img_n == 3) {
3266 z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3267 } else
3268 for (i=0; i < z->s->img_x; ++i) {
3269 out[0] = out[1] = out[2] = y[i];
3270 out[3] = 255; // not used if n==3
3271 out += n;
3273 } else {
3274 stbi_uc *y = coutput[0];
3275 if (n == 1)
3276 for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3277 else
3278 for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3281 stbi__cleanup_jpeg(z);
3282 *out_x = z->s->img_x;
3283 *out_y = z->s->img_y;
3284 if (comp) *comp = z->s->img_n; // report original components, not output
3285 return output;
3289 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3291 stbi__jpeg j;
3292 j.s = s;
3293 stbi__setup_jpeg(&j);
3294 return load_jpeg_image(&j, x,y,comp,req_comp);
3297 static int stbi__jpeg_test(stbi__context *s)
3299 int r;
3300 stbi__jpeg j;
3301 j.s = s;
3302 stbi__setup_jpeg(&j);
3303 r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3304 stbi__rewind(s);
3305 return r;
3308 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3310 if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3311 stbi__rewind( j->s );
3312 return 0;
3314 if (x) *x = j->s->img_x;
3315 if (y) *y = j->s->img_y;
3316 if (comp) *comp = j->s->img_n;
3317 return 1;
3320 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3322 stbi__jpeg j;
3323 j.s = s;
3324 return stbi__jpeg_info_raw(&j, x, y, comp);
3326 #endif
3328 // public domain zlib decode v0.2 Sean Barrett 2006-11-18
3329 // simple implementation
3330 // - all input must be provided in an upfront buffer
3331 // - all output is written to a single output buffer (can malloc/realloc)
3332 // performance
3333 // - fast huffman
3335 #ifndef STBI_NO_ZLIB
3337 // fast-way is faster to check than jpeg huffman, but slow way is slower
3338 #define STBI__ZFAST_BITS 9 // accelerate all cases in default tables
3339 #define STBI__ZFAST_MASK ((1 << STBI__ZFAST_BITS) - 1)
3341 // zlib-style huffman encoding
3342 // (jpegs packs from left, zlib from right, so can't share code)
3343 typedef struct
3345 stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3346 stbi__uint16 firstcode[16];
3347 int maxcode[17];
3348 stbi__uint16 firstsymbol[16];
3349 stbi_uc size[288];
3350 stbi__uint16 value[288];
3351 } stbi__zhuffman;
3353 stbi_inline static int stbi__bitreverse16(int n)
3355 n = ((n & 0xAAAA) >> 1) | ((n & 0x5555) << 1);
3356 n = ((n & 0xCCCC) >> 2) | ((n & 0x3333) << 2);
3357 n = ((n & 0xF0F0) >> 4) | ((n & 0x0F0F) << 4);
3358 n = ((n & 0xFF00) >> 8) | ((n & 0x00FF) << 8);
3359 return n;
3362 stbi_inline static int stbi__bit_reverse(int v, int bits)
3364 STBI_ASSERT(bits <= 16);
3365 // to bit reverse n bits, reverse 16 and shift
3366 // e.g. 11 bits, bit reverse and shift away 5
3367 return stbi__bitreverse16(v) >> (16-bits);
3370 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3372 int i,k=0;
3373 int code, next_code[16], sizes[17];
3375 // DEFLATE spec for generating codes
3376 memset(sizes, 0, sizeof(sizes));
3377 memset(z->fast, 0, sizeof(z->fast));
3378 for (i=0; i < num; ++i)
3379 ++sizes[sizelist[i]];
3380 sizes[0] = 0;
3381 for (i=1; i < 16; ++i)
3382 STBI_ASSERT(sizes[i] <= (1 << i));
3383 code = 0;
3384 for (i=1; i < 16; ++i) {
3385 next_code[i] = code;
3386 z->firstcode[i] = (stbi__uint16) code;
3387 z->firstsymbol[i] = (stbi__uint16) k;
3388 code = (code + sizes[i]);
3389 if (sizes[i])
3390 if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt JPEG");
3391 z->maxcode[i] = code << (16-i); // preshift for inner loop
3392 code <<= 1;
3393 k += sizes[i];
3395 z->maxcode[16] = 0x10000; // sentinel
3396 for (i=0; i < num; ++i) {
3397 int s = sizelist[i];
3398 if (s) {
3399 int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3400 stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3401 z->size [c] = (stbi_uc ) s;
3402 z->value[c] = (stbi__uint16) i;
3403 if (s <= STBI__ZFAST_BITS) {
3404 int k = stbi__bit_reverse(next_code[s],s);
3405 while (k < (1 << STBI__ZFAST_BITS)) {
3406 z->fast[k] = fastv;
3407 k += (1 << s);
3410 ++next_code[s];
3413 return 1;
3416 // zlib-from-memory implementation for PNG reading
3417 // because PNG allows splitting the zlib stream arbitrarily,
3418 // and it's annoying structurally to have PNG call ZLIB call PNG,
3419 // we require PNG read all the IDATs and combine them into a single
3420 // memory buffer
3422 typedef struct
3424 stbi_uc *zbuffer, *zbuffer_end;
3425 int num_bits;
3426 stbi__uint32 code_buffer;
3428 char *zout;
3429 char *zout_start;
3430 char *zout_end;
3431 int z_expandable;
3433 stbi__zhuffman z_length, z_distance;
3434 } stbi__zbuf;
3436 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3438 if (z->zbuffer >= z->zbuffer_end) return 0;
3439 return *z->zbuffer++;
3442 static void stbi__fill_bits(stbi__zbuf *z)
3444 do {
3445 STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3446 z->code_buffer |= stbi__zget8(z) << z->num_bits;
3447 z->num_bits += 8;
3448 } while (z->num_bits <= 24);
3451 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3453 unsigned int k;
3454 if (z->num_bits < n) stbi__fill_bits(z);
3455 k = z->code_buffer & ((1 << n) - 1);
3456 z->code_buffer >>= n;
3457 z->num_bits -= n;
3458 return k;
3461 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3463 int b,s,k;
3464 // not resolved by fast table, so compute it the slow way
3465 // use jpeg approach, which requires MSbits at top
3466 k = stbi__bit_reverse(a->code_buffer, 16);
3467 for (s=STBI__ZFAST_BITS+1; ; ++s)
3468 if (k < z->maxcode[s])
3469 break;
3470 if (s == 16) return -1; // invalid code!
3471 // code size is s, so:
3472 b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3473 STBI_ASSERT(z->size[b] == s);
3474 a->code_buffer >>= s;
3475 a->num_bits -= s;
3476 return z->value[b];
3479 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3481 int b,s;
3482 if (a->num_bits < 16) stbi__fill_bits(a);
3483 b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3484 if (b) {
3485 s = b >> 9;
3486 a->code_buffer >>= s;
3487 a->num_bits -= s;
3488 return b & 511;
3490 return stbi__zhuffman_decode_slowpath(a, z);
3493 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes
3495 char *q;
3496 int cur, limit;
3497 z->zout = zout;
3498 if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3499 cur = (int) (z->zout - z->zout_start);
3500 limit = (int) (z->zout_end - z->zout_start);
3501 while (cur + n > limit)
3502 limit *= 2;
3503 q = (char *) STBI_REALLOC(z->zout_start, limit);
3504 if (q == NULL) return stbi__err("outofmem", "Out of memory");
3505 z->zout_start = q;
3506 z->zout = q + cur;
3507 z->zout_end = q + limit;
3508 return 1;
3511 static int stbi__zlength_base[31] = {
3512 3,4,5,6,7,8,9,10,11,13,
3513 15,17,19,23,27,31,35,43,51,59,
3514 67,83,99,115,131,163,195,227,258,0,0 };
3516 static int stbi__zlength_extra[31]=
3517 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3519 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3520 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3522 static int stbi__zdist_extra[32] =
3523 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3525 static int stbi__parse_huffman_block(stbi__zbuf *a)
3527 char *zout = a->zout;
3528 for(;;) {
3529 int z = stbi__zhuffman_decode(a, &a->z_length);
3530 if (z < 256) {
3531 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3532 if (zout >= a->zout_end) {
3533 if (!stbi__zexpand(a, zout, 1)) return 0;
3534 zout = a->zout;
3536 *zout++ = (char) z;
3537 } else {
3538 stbi_uc *p;
3539 int len,dist;
3540 if (z == 256) {
3541 a->zout = zout;
3542 return 1;
3544 z -= 257;
3545 len = stbi__zlength_base[z];
3546 if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3547 z = stbi__zhuffman_decode(a, &a->z_distance);
3548 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3549 dist = stbi__zdist_base[z];
3550 if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3551 if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3552 if (zout + len > a->zout_end) {
3553 if (!stbi__zexpand(a, zout, len)) return 0;
3554 zout = a->zout;
3556 p = (stbi_uc *) (zout - dist);
3557 if (dist == 1) { // run of one byte; common in images.
3558 stbi_uc v = *p;
3559 do *zout++ = v; while (--len);
3560 } else {
3561 do *zout++ = *p++; while (--len);
3567 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3569 static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3570 stbi__zhuffman z_codelength;
3571 stbi_uc lencodes[286+32+137];//padding for maximum single op
3572 stbi_uc codelength_sizes[19];
3573 int i,n;
3575 int hlit = stbi__zreceive(a,5) + 257;
3576 int hdist = stbi__zreceive(a,5) + 1;
3577 int hclen = stbi__zreceive(a,4) + 4;
3579 memset(codelength_sizes, 0, sizeof(codelength_sizes));
3580 for (i=0; i < hclen; ++i) {
3581 int s = stbi__zreceive(a,3);
3582 codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3584 if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3586 n = 0;
3587 while (n < hlit + hdist) {
3588 int c = stbi__zhuffman_decode(a, &z_codelength);
3589 STBI_ASSERT(c >= 0 && c < 19);
3590 if (c < 16)
3591 lencodes[n++] = (stbi_uc) c;
3592 else if (c == 16) {
3593 c = stbi__zreceive(a,2)+3;
3594 memset(lencodes+n, lencodes[n-1], c);
3595 n += c;
3596 } else if (c == 17) {
3597 c = stbi__zreceive(a,3)+3;
3598 memset(lencodes+n, 0, c);
3599 n += c;
3600 } else {
3601 STBI_ASSERT(c == 18);
3602 c = stbi__zreceive(a,7)+11;
3603 memset(lencodes+n, 0, c);
3604 n += c;
3607 if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3608 if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3609 if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3610 return 1;
3613 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
3615 stbi_uc header[4];
3616 int len,nlen,k;
3617 if (a->num_bits & 7)
3618 stbi__zreceive(a, a->num_bits & 7); // discard
3619 // drain the bit-packed data into header
3620 k = 0;
3621 while (a->num_bits > 0) {
3622 header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3623 a->code_buffer >>= 8;
3624 a->num_bits -= 8;
3626 STBI_ASSERT(a->num_bits == 0);
3627 // now fill header the normal way
3628 while (k < 4)
3629 header[k++] = stbi__zget8(a);
3630 len = header[1] * 256 + header[0];
3631 nlen = header[3] * 256 + header[2];
3632 if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3633 if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3634 if (a->zout + len > a->zout_end)
3635 if (!stbi__zexpand(a, a->zout, len)) return 0;
3636 memcpy(a->zout, a->zbuffer, len);
3637 a->zbuffer += len;
3638 a->zout += len;
3639 return 1;
3642 static int stbi__parse_zlib_header(stbi__zbuf *a)
3644 int cmf = stbi__zget8(a);
3645 int cm = cmf & 15;
3646 /* int cinfo = cmf >> 4; */
3647 int flg = stbi__zget8(a);
3648 if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3649 if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3650 if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3651 // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3652 return 1;
3655 // @TODO: should statically initialize these for optimal thread safety
3656 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
3657 static void stbi__init_zdefaults(void)
3659 int i; // use <= to match clearly with spec
3660 for (i=0; i <= 143; ++i) stbi__zdefault_length[i] = 8;
3661 for ( ; i <= 255; ++i) stbi__zdefault_length[i] = 9;
3662 for ( ; i <= 279; ++i) stbi__zdefault_length[i] = 7;
3663 for ( ; i <= 287; ++i) stbi__zdefault_length[i] = 8;
3665 for (i=0; i <= 31; ++i) stbi__zdefault_distance[i] = 5;
3668 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3670 int final, type;
3671 if (parse_header)
3672 if (!stbi__parse_zlib_header(a)) return 0;
3673 a->num_bits = 0;
3674 a->code_buffer = 0;
3675 do {
3676 final = stbi__zreceive(a,1);
3677 type = stbi__zreceive(a,2);
3678 if (type == 0) {
3679 if (!stbi__parse_uncomperssed_block(a)) return 0;
3680 } else if (type == 3) {
3681 return 0;
3682 } else {
3683 if (type == 1) {
3684 // use fixed code lengths
3685 if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3686 if (!stbi__zbuild_huffman(&a->z_length , stbi__zdefault_length , 288)) return 0;
3687 if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance, 32)) return 0;
3688 } else {
3689 if (!stbi__compute_huffman_codes(a)) return 0;
3691 if (!stbi__parse_huffman_block(a)) return 0;
3693 } while (!final);
3694 return 1;
3697 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3699 a->zout_start = obuf;
3700 a->zout = obuf;
3701 a->zout_end = obuf + olen;
3702 a->z_expandable = exp;
3704 return stbi__parse_zlib(a, parse_header);
3707 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3709 stbi__zbuf a;
3710 char *p = (char *) stbi__malloc(initial_size);
3711 if (p == NULL) return NULL;
3712 a.zbuffer = (stbi_uc *) buffer;
3713 a.zbuffer_end = (stbi_uc *) buffer + len;
3714 if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3715 if (outlen) *outlen = (int) (a.zout - a.zout_start);
3716 return a.zout_start;
3717 } else {
3718 STBI_FREE(a.zout_start);
3719 return NULL;
3723 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3725 return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3728 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3730 stbi__zbuf a;
3731 char *p = (char *) stbi__malloc(initial_size);
3732 if (p == NULL) return NULL;
3733 a.zbuffer = (stbi_uc *) buffer;
3734 a.zbuffer_end = (stbi_uc *) buffer + len;
3735 if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3736 if (outlen) *outlen = (int) (a.zout - a.zout_start);
3737 return a.zout_start;
3738 } else {
3739 STBI_FREE(a.zout_start);
3740 return NULL;
3744 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3746 stbi__zbuf a;
3747 a.zbuffer = (stbi_uc *) ibuffer;
3748 a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3749 if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3750 return (int) (a.zout - a.zout_start);
3751 else
3752 return -1;
3755 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3757 stbi__zbuf a;
3758 char *p = (char *) stbi__malloc(16384);
3759 if (p == NULL) return NULL;
3760 a.zbuffer = (stbi_uc *) buffer;
3761 a.zbuffer_end = (stbi_uc *) buffer+len;
3762 if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3763 if (outlen) *outlen = (int) (a.zout - a.zout_start);
3764 return a.zout_start;
3765 } else {
3766 STBI_FREE(a.zout_start);
3767 return NULL;
3771 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3773 stbi__zbuf a;
3774 a.zbuffer = (stbi_uc *) ibuffer;
3775 a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3776 if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3777 return (int) (a.zout - a.zout_start);
3778 else
3779 return -1;
3781 #endif
3783 // public domain "baseline" PNG decoder v0.10 Sean Barrett 2006-11-18
3784 // simple implementation
3785 // - only 8-bit samples
3786 // - no CRC checking
3787 // - allocates lots of intermediate memory
3788 // - avoids problem of streaming data between subsystems
3789 // - avoids explicit window management
3790 // performance
3791 // - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3793 #ifndef STBI_NO_PNG
3794 typedef struct
3796 stbi__uint32 length;
3797 stbi__uint32 type;
3798 } stbi__pngchunk;
3800 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3802 stbi__pngchunk c;
3803 c.length = stbi__get32be(s);
3804 c.type = stbi__get32be(s);
3805 return c;
3808 static int stbi__check_png_header(stbi__context *s)
3810 static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3811 int i;
3812 for (i=0; i < 8; ++i)
3813 if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3814 return 1;
3817 typedef struct
3819 stbi__context *s;
3820 stbi_uc *idata, *expanded, *out;
3821 } stbi__png;
3824 enum {
3825 STBI__F_none=0,
3826 STBI__F_sub=1,
3827 STBI__F_up=2,
3828 STBI__F_avg=3,
3829 STBI__F_paeth=4,
3830 // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3831 STBI__F_avg_first,
3832 STBI__F_paeth_first
3835 static stbi_uc first_row_filter[5] =
3837 STBI__F_none,
3838 STBI__F_sub,
3839 STBI__F_none,
3840 STBI__F_avg_first,
3841 STBI__F_paeth_first
3844 static int stbi__paeth(int a, int b, int c)
3846 int p = a + b - c;
3847 int pa = abs(p-a);
3848 int pb = abs(p-b);
3849 int pc = abs(p-c);
3850 if (pa <= pb && pa <= pc) return a;
3851 if (pb <= pc) return b;
3852 return c;
3855 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
3857 // create the png data from post-deflated data
3858 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
3860 stbi__context *s = a->s;
3861 stbi__uint32 i,j,stride = x*out_n;
3862 stbi__uint32 img_len, img_width_bytes;
3863 int k;
3864 int img_n = s->img_n; // copy it into a local for later
3866 STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
3867 a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
3868 if (!a->out) return stbi__err("outofmem", "Out of memory");
3870 img_width_bytes = (((img_n * x * depth) + 7) >> 3);
3871 img_len = (img_width_bytes + 1) * y;
3872 if (s->img_x == x && s->img_y == y) {
3873 if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
3874 } else { // interlaced:
3875 if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
3878 for (j=0; j < y; ++j) {
3879 stbi_uc *cur = a->out + stride*j;
3880 stbi_uc *prior = cur - stride;
3881 int filter = *raw++;
3882 int filter_bytes = img_n;
3883 int width = x;
3884 if (filter > 4)
3885 return stbi__err("invalid filter","Corrupt PNG");
3887 if (depth < 8) {
3888 STBI_ASSERT(img_width_bytes <= x);
3889 cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
3890 filter_bytes = 1;
3891 width = img_width_bytes;
3894 // if first row, use special filter that doesn't sample previous row
3895 if (j == 0) filter = first_row_filter[filter];
3897 // handle first byte explicitly
3898 for (k=0; k < filter_bytes; ++k) {
3899 switch (filter) {
3900 case STBI__F_none : cur[k] = raw[k]; break;
3901 case STBI__F_sub : cur[k] = raw[k]; break;
3902 case STBI__F_up : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
3903 case STBI__F_avg : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
3904 case STBI__F_paeth : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
3905 case STBI__F_avg_first : cur[k] = raw[k]; break;
3906 case STBI__F_paeth_first: cur[k] = raw[k]; break;
3910 if (depth == 8) {
3911 if (img_n != out_n)
3912 cur[img_n] = 255; // first pixel
3913 raw += img_n;
3914 cur += out_n;
3915 prior += out_n;
3916 } else {
3917 raw += 1;
3918 cur += 1;
3919 prior += 1;
3922 // this is a little gross, so that we don't switch per-pixel or per-component
3923 if (depth < 8 || img_n == out_n) {
3924 int nk = (width - 1)*img_n;
3925 #define CASE(f) \
3926 case f: \
3927 for (k=0; k < nk; ++k)
3928 switch (filter) {
3929 // "none" filter turns into a memcpy here; make that explicit.
3930 case STBI__F_none: memcpy(cur, raw, nk); break;
3931 CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
3932 CASE(STBI__F_up) cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
3933 CASE(STBI__F_avg) cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
3934 CASE(STBI__F_paeth) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
3935 CASE(STBI__F_avg_first) cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
3936 CASE(STBI__F_paeth_first) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
3938 #undef CASE
3939 raw += nk;
3940 } else {
3941 STBI_ASSERT(img_n+1 == out_n);
3942 #define CASE(f) \
3943 case f: \
3944 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
3945 for (k=0; k < img_n; ++k)
3946 switch (filter) {
3947 CASE(STBI__F_none) cur[k] = raw[k]; break;
3948 CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
3949 CASE(STBI__F_up) cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
3950 CASE(STBI__F_avg) cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
3951 CASE(STBI__F_paeth) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
3952 CASE(STBI__F_avg_first) cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
3953 CASE(STBI__F_paeth_first) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
3955 #undef CASE
3959 // we make a separate pass to expand bits to pixels; for performance,
3960 // this could run two scanlines behind the above code, so it won't
3961 // intefere with filtering but will still be in the cache.
3962 if (depth < 8) {
3963 for (j=0; j < y; ++j) {
3964 stbi_uc *cur = a->out + stride*j;
3965 stbi_uc *in = a->out + stride*j + x*out_n - img_width_bytes;
3966 // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
3967 // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
3968 stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
3970 // note that the final byte might overshoot and write more data than desired.
3971 // we can allocate enough data that this never writes out of memory, but it
3972 // could also overwrite the next scanline. can it overwrite non-empty data
3973 // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
3974 // so we need to explicitly clamp the final ones
3976 if (depth == 4) {
3977 for (k=x*img_n; k >= 2; k-=2, ++in) {
3978 *cur++ = scale * ((*in >> 4) );
3979 *cur++ = scale * ((*in ) & 0x0f);
3981 if (k > 0) *cur++ = scale * ((*in >> 4) );
3982 } else if (depth == 2) {
3983 for (k=x*img_n; k >= 4; k-=4, ++in) {
3984 *cur++ = scale * ((*in >> 6) );
3985 *cur++ = scale * ((*in >> 4) & 0x03);
3986 *cur++ = scale * ((*in >> 2) & 0x03);
3987 *cur++ = scale * ((*in ) & 0x03);
3989 if (k > 0) *cur++ = scale * ((*in >> 6) );
3990 if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
3991 if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
3992 } else if (depth == 1) {
3993 for (k=x*img_n; k >= 8; k-=8, ++in) {
3994 *cur++ = scale * ((*in >> 7) );
3995 *cur++ = scale * ((*in >> 6) & 0x01);
3996 *cur++ = scale * ((*in >> 5) & 0x01);
3997 *cur++ = scale * ((*in >> 4) & 0x01);
3998 *cur++ = scale * ((*in >> 3) & 0x01);
3999 *cur++ = scale * ((*in >> 2) & 0x01);
4000 *cur++ = scale * ((*in >> 1) & 0x01);
4001 *cur++ = scale * ((*in ) & 0x01);
4003 if (k > 0) *cur++ = scale * ((*in >> 7) );
4004 if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4005 if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4006 if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4007 if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4008 if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4009 if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4011 if (img_n != out_n) {
4012 // insert alpha = 255
4013 stbi_uc *cur = a->out + stride*j;
4014 int i;
4015 if (img_n == 1) {
4016 for (i=x-1; i >= 0; --i) {
4017 cur[i*2+1] = 255;
4018 cur[i*2+0] = cur[i];
4020 } else {
4021 STBI_ASSERT(img_n == 3);
4022 for (i=x-1; i >= 0; --i) {
4023 cur[i*4+3] = 255;
4024 cur[i*4+2] = cur[i*3+2];
4025 cur[i*4+1] = cur[i*3+1];
4026 cur[i*4+0] = cur[i*3+0];
4033 return 1;
4036 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4038 stbi_uc *final;
4039 int p;
4040 if (!interlaced)
4041 return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4043 // de-interlacing
4044 final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4045 for (p=0; p < 7; ++p) {
4046 int xorig[] = { 0,4,0,2,0,1,0 };
4047 int yorig[] = { 0,0,4,0,2,0,1 };
4048 int xspc[] = { 8,8,4,4,2,2,1 };
4049 int yspc[] = { 8,8,8,4,4,2,2 };
4050 int i,j,x,y;
4051 // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4052 x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4053 y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4054 if (x && y) {
4055 stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4056 if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4057 STBI_FREE(final);
4058 return 0;
4060 for (j=0; j < y; ++j) {
4061 for (i=0; i < x; ++i) {
4062 int out_y = j*yspc[p]+yorig[p];
4063 int out_x = i*xspc[p]+xorig[p];
4064 memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4065 a->out + (j*x+i)*out_n, out_n);
4068 STBI_FREE(a->out);
4069 image_data += img_len;
4070 image_data_len -= img_len;
4073 a->out = final;
4075 return 1;
4078 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4080 stbi__context *s = z->s;
4081 stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4082 stbi_uc *p = z->out;
4084 // compute color-based transparency, assuming we've
4085 // already got 255 as the alpha value in the output
4086 STBI_ASSERT(out_n == 2 || out_n == 4);
4088 if (out_n == 2) {
4089 for (i=0; i < pixel_count; ++i) {
4090 p[1] = (p[0] == tc[0] ? 0 : 255);
4091 p += 2;
4093 } else {
4094 for (i=0; i < pixel_count; ++i) {
4095 if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4096 p[3] = 0;
4097 p += 4;
4100 return 1;
4103 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4105 stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4106 stbi_uc *p, *temp_out, *orig = a->out;
4108 p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4109 if (p == NULL) return stbi__err("outofmem", "Out of memory");
4111 // between here and free(out) below, exitting would leak
4112 temp_out = p;
4114 if (pal_img_n == 3) {
4115 for (i=0; i < pixel_count; ++i) {
4116 int n = orig[i]*4;
4117 p[0] = palette[n ];
4118 p[1] = palette[n+1];
4119 p[2] = palette[n+2];
4120 p += 3;
4122 } else {
4123 for (i=0; i < pixel_count; ++i) {
4124 int n = orig[i]*4;
4125 p[0] = palette[n ];
4126 p[1] = palette[n+1];
4127 p[2] = palette[n+2];
4128 p[3] = palette[n+3];
4129 p += 4;
4132 STBI_FREE(a->out);
4133 a->out = temp_out;
4135 STBI_NOTUSED(len);
4137 return 1;
4140 static int stbi__unpremultiply_on_load = 0;
4141 static int stbi__de_iphone_flag = 0;
4143 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4145 stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4148 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4150 stbi__de_iphone_flag = flag_true_if_should_convert;
4153 static void stbi__de_iphone(stbi__png *z)
4155 stbi__context *s = z->s;
4156 stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4157 stbi_uc *p = z->out;
4159 if (s->img_out_n == 3) { // convert bgr to rgb
4160 for (i=0; i < pixel_count; ++i) {
4161 stbi_uc t = p[0];
4162 p[0] = p[2];
4163 p[2] = t;
4164 p += 3;
4166 } else {
4167 STBI_ASSERT(s->img_out_n == 4);
4168 if (stbi__unpremultiply_on_load) {
4169 // convert bgr to rgb and unpremultiply
4170 for (i=0; i < pixel_count; ++i) {
4171 stbi_uc a = p[3];
4172 stbi_uc t = p[0];
4173 if (a) {
4174 p[0] = p[2] * 255 / a;
4175 p[1] = p[1] * 255 / a;
4176 p[2] = t * 255 / a;
4177 } else {
4178 p[0] = p[2];
4179 p[2] = t;
4181 p += 4;
4183 } else {
4184 // convert bgr to rgb
4185 for (i=0; i < pixel_count; ++i) {
4186 stbi_uc t = p[0];
4187 p[0] = p[2];
4188 p[2] = t;
4189 p += 4;
4195 #define STBI__PNG_TYPE(a,b,c,d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4197 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4199 stbi_uc palette[1024], pal_img_n=0;
4200 stbi_uc has_trans=0, tc[3];
4201 stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4202 int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
4203 stbi__context *s = z->s;
4205 z->expanded = NULL;
4206 z->idata = NULL;
4207 z->out = NULL;
4209 if (!stbi__check_png_header(s)) return 0;
4211 if (scan == STBI__SCAN_type) return 1;
4213 for (;;) {
4214 stbi__pngchunk c = stbi__get_chunk_header(s);
4215 switch (c.type) {
4216 case STBI__PNG_TYPE('C','g','B','I'):
4217 is_iphone = 1;
4218 stbi__skip(s, c.length);
4219 break;
4220 case STBI__PNG_TYPE('I','H','D','R'): {
4221 int comp,filter;
4222 if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4223 first = 0;
4224 if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4225 s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4226 s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4227 depth = stbi__get8(s); if (depth != 1 && depth != 2 && depth != 4 && depth != 8) return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
4228 color = stbi__get8(s); if (color > 6) return stbi__err("bad ctype","Corrupt PNG");
4229 if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4230 comp = stbi__get8(s); if (comp) return stbi__err("bad comp method","Corrupt PNG");
4231 filter= stbi__get8(s); if (filter) return stbi__err("bad filter method","Corrupt PNG");
4232 interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4233 if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4234 if (!pal_img_n) {
4235 s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4236 if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4237 if (scan == STBI__SCAN_header) return 1;
4238 } else {
4239 // if paletted, then pal_n is our final components, and
4240 // img_n is # components to decompress/filter.
4241 s->img_n = 1;
4242 if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4243 // if SCAN_header, have to scan to see if we have a tRNS
4245 break;
4248 case STBI__PNG_TYPE('P','L','T','E'): {
4249 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4250 if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4251 pal_len = c.length / 3;
4252 if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4253 for (i=0; i < pal_len; ++i) {
4254 palette[i*4+0] = stbi__get8(s);
4255 palette[i*4+1] = stbi__get8(s);
4256 palette[i*4+2] = stbi__get8(s);
4257 palette[i*4+3] = 255;
4259 break;
4262 case STBI__PNG_TYPE('t','R','N','S'): {
4263 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4264 if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4265 if (pal_img_n) {
4266 if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4267 if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4268 if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4269 pal_img_n = 4;
4270 for (i=0; i < c.length; ++i)
4271 palette[i*4+3] = stbi__get8(s);
4272 } else {
4273 if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4274 if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4275 has_trans = 1;
4276 for (k=0; k < s->img_n; ++k)
4277 tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
4279 break;
4282 case STBI__PNG_TYPE('I','D','A','T'): {
4283 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4284 if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4285 if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4286 if (ioff + c.length > idata_limit) {
4287 stbi_uc *p;
4288 if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4289 while (ioff + c.length > idata_limit)
4290 idata_limit *= 2;
4291 p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4292 z->idata = p;
4294 if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4295 ioff += c.length;
4296 break;
4299 case STBI__PNG_TYPE('I','E','N','D'): {
4300 stbi__uint32 raw_len, bpl;
4301 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4302 if (scan != STBI__SCAN_load) return 1;
4303 if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4304 // initial guess for decoded data size to avoid unnecessary reallocs
4305 bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
4306 raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4307 z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4308 if (z->expanded == NULL) return 0; // zlib should set error
4309 STBI_FREE(z->idata); z->idata = NULL;
4310 if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4311 s->img_out_n = s->img_n+1;
4312 else
4313 s->img_out_n = s->img_n;
4314 if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
4315 if (has_trans)
4316 if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4317 if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4318 stbi__de_iphone(z);
4319 if (pal_img_n) {
4320 // pal_img_n == 3 or 4
4321 s->img_n = pal_img_n; // record the actual colors we had
4322 s->img_out_n = pal_img_n;
4323 if (req_comp >= 3) s->img_out_n = req_comp;
4324 if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4325 return 0;
4327 STBI_FREE(z->expanded); z->expanded = NULL;
4328 return 1;
4331 default:
4332 // if critical, fail
4333 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4334 if ((c.type & (1 << 29)) == 0) {
4335 #ifndef STBI_NO_FAILURE_STRINGS
4336 // not threadsafe
4337 static char invalid_chunk[] = "XXXX PNG chunk not known";
4338 invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4339 invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4340 invalid_chunk[2] = STBI__BYTECAST(c.type >> 8);
4341 invalid_chunk[3] = STBI__BYTECAST(c.type >> 0);
4342 #endif
4343 return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4345 stbi__skip(s, c.length);
4346 break;
4348 // end of PNG chunk, read and skip CRC
4349 stbi__get32be(s);
4353 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4355 unsigned char *result=NULL;
4356 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4357 if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4358 result = p->out;
4359 p->out = NULL;
4360 if (req_comp && req_comp != p->s->img_out_n) {
4361 result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4362 p->s->img_out_n = req_comp;
4363 if (result == NULL) return result;
4365 *x = p->s->img_x;
4366 *y = p->s->img_y;
4367 if (n) *n = p->s->img_out_n;
4369 STBI_FREE(p->out); p->out = NULL;
4370 STBI_FREE(p->expanded); p->expanded = NULL;
4371 STBI_FREE(p->idata); p->idata = NULL;
4373 return result;
4376 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4378 stbi__png p;
4379 p.s = s;
4380 return stbi__do_png(&p, x,y,comp,req_comp);
4383 static int stbi__png_test(stbi__context *s)
4385 int r;
4386 r = stbi__check_png_header(s);
4387 stbi__rewind(s);
4388 return r;
4391 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4393 if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4394 stbi__rewind( p->s );
4395 return 0;
4397 if (x) *x = p->s->img_x;
4398 if (y) *y = p->s->img_y;
4399 if (comp) *comp = p->s->img_n;
4400 return 1;
4403 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4405 stbi__png p;
4406 p.s = s;
4407 return stbi__png_info_raw(&p, x, y, comp);
4409 #endif
4411 // Microsoft/Windows BMP image
4413 #ifndef STBI_NO_BMP
4414 static int stbi__bmp_test_raw(stbi__context *s)
4416 int r;
4417 int sz;
4418 if (stbi__get8(s) != 'B') return 0;
4419 if (stbi__get8(s) != 'M') return 0;
4420 stbi__get32le(s); // discard filesize
4421 stbi__get16le(s); // discard reserved
4422 stbi__get16le(s); // discard reserved
4423 stbi__get32le(s); // discard data offset
4424 sz = stbi__get32le(s);
4425 r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4426 return r;
4429 static int stbi__bmp_test(stbi__context *s)
4431 int r = stbi__bmp_test_raw(s);
4432 stbi__rewind(s);
4433 return r;
4437 // returns 0..31 for the highest set bit
4438 static int stbi__high_bit(unsigned int z)
4440 int n=0;
4441 if (z == 0) return -1;
4442 if (z >= 0x10000) n += 16, z >>= 16;
4443 if (z >= 0x00100) n += 8, z >>= 8;
4444 if (z >= 0x00010) n += 4, z >>= 4;
4445 if (z >= 0x00004) n += 2, z >>= 2;
4446 if (z >= 0x00002) n += 1, z >>= 1;
4447 return n;
4450 static int stbi__bitcount(unsigned int a)
4452 a = (a & 0x55555555) + ((a >> 1) & 0x55555555); // max 2
4453 a = (a & 0x33333333) + ((a >> 2) & 0x33333333); // max 4
4454 a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4455 a = (a + (a >> 8)); // max 16 per 8 bits
4456 a = (a + (a >> 16)); // max 32 per 8 bits
4457 return a & 0xff;
4460 static int stbi__shiftsigned(int v, int shift, int bits)
4462 int result;
4463 int z=0;
4465 if (shift < 0) v <<= -shift;
4466 else v >>= shift;
4467 result = v;
4469 z = bits;
4470 while (z < 8) {
4471 result += v >> z;
4472 z += bits;
4474 return result;
4477 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4479 stbi_uc *out;
4480 unsigned int mr=0,mg=0,mb=0,ma=0, fake_a=0;
4481 stbi_uc pal[256][4];
4482 int psize=0,i,j,compress=0,width;
4483 int bpp, flip_vertically, pad, target, offset, hsz;
4484 if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4485 stbi__get32le(s); // discard filesize
4486 stbi__get16le(s); // discard reserved
4487 stbi__get16le(s); // discard reserved
4488 offset = stbi__get32le(s);
4489 hsz = stbi__get32le(s);
4490 if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4491 if (hsz == 12) {
4492 s->img_x = stbi__get16le(s);
4493 s->img_y = stbi__get16le(s);
4494 } else {
4495 s->img_x = stbi__get32le(s);
4496 s->img_y = stbi__get32le(s);
4498 if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4499 bpp = stbi__get16le(s);
4500 if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4501 flip_vertically = ((int) s->img_y) > 0;
4502 s->img_y = abs((int) s->img_y);
4503 if (hsz == 12) {
4504 if (bpp < 24)
4505 psize = (offset - 14 - 24) / 3;
4506 } else {
4507 compress = stbi__get32le(s);
4508 if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4509 stbi__get32le(s); // discard sizeof
4510 stbi__get32le(s); // discard hres
4511 stbi__get32le(s); // discard vres
4512 stbi__get32le(s); // discard colorsused
4513 stbi__get32le(s); // discard max important
4514 if (hsz == 40 || hsz == 56) {
4515 if (hsz == 56) {
4516 stbi__get32le(s);
4517 stbi__get32le(s);
4518 stbi__get32le(s);
4519 stbi__get32le(s);
4521 if (bpp == 16 || bpp == 32) {
4522 mr = mg = mb = 0;
4523 if (compress == 0) {
4524 if (bpp == 32) {
4525 mr = 0xffu << 16;
4526 mg = 0xffu << 8;
4527 mb = 0xffu << 0;
4528 ma = 0xffu << 24;
4529 fake_a = 1; // @TODO: check for cases like alpha value is all 0 and switch it to 255
4530 STBI_NOTUSED(fake_a);
4531 } else {
4532 mr = 31u << 10;
4533 mg = 31u << 5;
4534 mb = 31u << 0;
4536 } else if (compress == 3) {
4537 mr = stbi__get32le(s);
4538 mg = stbi__get32le(s);
4539 mb = stbi__get32le(s);
4540 // not documented, but generated by photoshop and handled by mspaint
4541 if (mr == mg && mg == mb) {
4542 // ?!?!?
4543 return stbi__errpuc("bad BMP", "bad BMP");
4545 } else
4546 return stbi__errpuc("bad BMP", "bad BMP");
4548 } else {
4549 STBI_ASSERT(hsz == 108 || hsz == 124);
4550 mr = stbi__get32le(s);
4551 mg = stbi__get32le(s);
4552 mb = stbi__get32le(s);
4553 ma = stbi__get32le(s);
4554 stbi__get32le(s); // discard color space
4555 for (i=0; i < 12; ++i)
4556 stbi__get32le(s); // discard color space parameters
4557 if (hsz == 124) {
4558 stbi__get32le(s); // discard rendering intent
4559 stbi__get32le(s); // discard offset of profile data
4560 stbi__get32le(s); // discard size of profile data
4561 stbi__get32le(s); // discard reserved
4564 if (bpp < 16)
4565 psize = (offset - 14 - hsz) >> 2;
4567 s->img_n = ma ? 4 : 3;
4568 if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4569 target = req_comp;
4570 else
4571 target = s->img_n; // if they want monochrome, we'll post-convert
4572 out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4573 if (!out) return stbi__errpuc("outofmem", "Out of memory");
4574 if (bpp < 16) {
4575 int z=0;
4576 if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4577 for (i=0; i < psize; ++i) {
4578 pal[i][2] = stbi__get8(s);
4579 pal[i][1] = stbi__get8(s);
4580 pal[i][0] = stbi__get8(s);
4581 if (hsz != 12) stbi__get8(s);
4582 pal[i][3] = 255;
4584 stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
4585 if (bpp == 4) width = (s->img_x + 1) >> 1;
4586 else if (bpp == 8) width = s->img_x;
4587 else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4588 pad = (-width)&3;
4589 for (j=0; j < (int) s->img_y; ++j) {
4590 for (i=0; i < (int) s->img_x; i += 2) {
4591 int v=stbi__get8(s),v2=0;
4592 if (bpp == 4) {
4593 v2 = v & 15;
4594 v >>= 4;
4596 out[z++] = pal[v][0];
4597 out[z++] = pal[v][1];
4598 out[z++] = pal[v][2];
4599 if (target == 4) out[z++] = 255;
4600 if (i+1 == (int) s->img_x) break;
4601 v = (bpp == 8) ? stbi__get8(s) : v2;
4602 out[z++] = pal[v][0];
4603 out[z++] = pal[v][1];
4604 out[z++] = pal[v][2];
4605 if (target == 4) out[z++] = 255;
4607 stbi__skip(s, pad);
4609 } else {
4610 int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4611 int z = 0;
4612 int easy=0;
4613 stbi__skip(s, offset - 14 - hsz);
4614 if (bpp == 24) width = 3 * s->img_x;
4615 else if (bpp == 16) width = 2*s->img_x;
4616 else /* bpp = 32 and pad = 0 */ width=0;
4617 pad = (-width) & 3;
4618 if (bpp == 24) {
4619 easy = 1;
4620 } else if (bpp == 32) {
4621 if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4622 easy = 2;
4624 if (!easy) {
4625 if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4626 // right shift amt to put high bit in position #7
4627 rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4628 gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4629 bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4630 ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4632 for (j=0; j < (int) s->img_y; ++j) {
4633 if (easy) {
4634 for (i=0; i < (int) s->img_x; ++i) {
4635 unsigned char a;
4636 out[z+2] = stbi__get8(s);
4637 out[z+1] = stbi__get8(s);
4638 out[z+0] = stbi__get8(s);
4639 z += 3;
4640 a = (easy == 2 ? stbi__get8(s) : 255);
4641 if (target == 4) out[z++] = a;
4643 } else {
4644 for (i=0; i < (int) s->img_x; ++i) {
4645 stbi__uint32 v = (stbi__uint32) (bpp == 16 ? stbi__get16le(s) : stbi__get32le(s));
4646 int a;
4647 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4648 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4649 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4650 a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4651 if (target == 4) out[z++] = STBI__BYTECAST(a);
4654 stbi__skip(s, pad);
4657 if (flip_vertically) {
4658 stbi_uc t;
4659 for (j=0; j < (int) s->img_y>>1; ++j) {
4660 stbi_uc *p1 = out + j *s->img_x*target;
4661 stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4662 for (i=0; i < (int) s->img_x*target; ++i) {
4663 t = p1[i], p1[i] = p2[i], p2[i] = t;
4668 if (req_comp && req_comp != target) {
4669 out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4670 if (out == NULL) return out; // stbi__convert_format frees input on failure
4673 *x = s->img_x;
4674 *y = s->img_y;
4675 if (comp) *comp = s->img_n;
4676 return out;
4678 #endif
4680 // Targa Truevision - TGA
4681 // by Jonathan Dummer
4682 #ifndef STBI_NO_TGA
4683 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4685 int tga_w, tga_h, tga_comp;
4686 int sz;
4687 stbi__get8(s); // discard Offset
4688 sz = stbi__get8(s); // color type
4689 if( sz > 1 ) {
4690 stbi__rewind(s);
4691 return 0; // only RGB or indexed allowed
4693 sz = stbi__get8(s); // image type
4694 // only RGB or grey allowed, +/- RLE
4695 if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
4696 stbi__skip(s,9);
4697 tga_w = stbi__get16le(s);
4698 if( tga_w < 1 ) {
4699 stbi__rewind(s);
4700 return 0; // test width
4702 tga_h = stbi__get16le(s);
4703 if( tga_h < 1 ) {
4704 stbi__rewind(s);
4705 return 0; // test height
4707 sz = stbi__get8(s); // bits per pixel
4708 // only RGB or RGBA or grey allowed
4709 if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
4710 stbi__rewind(s);
4711 return 0;
4713 tga_comp = sz;
4714 if (x) *x = tga_w;
4715 if (y) *y = tga_h;
4716 if (comp) *comp = tga_comp / 8;
4717 return 1; // seems to have passed everything
4720 static int stbi__tga_test(stbi__context *s)
4722 int res;
4723 int sz;
4724 stbi__get8(s); // discard Offset
4725 sz = stbi__get8(s); // color type
4726 if ( sz > 1 ) return 0; // only RGB or indexed allowed
4727 sz = stbi__get8(s); // image type
4728 if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0; // only RGB or grey allowed, +/- RLE
4729 stbi__get16be(s); // discard palette start
4730 stbi__get16be(s); // discard palette length
4731 stbi__get8(s); // discard bits per palette color entry
4732 stbi__get16be(s); // discard x origin
4733 stbi__get16be(s); // discard y origin
4734 if ( stbi__get16be(s) < 1 ) return 0; // test width
4735 if ( stbi__get16be(s) < 1 ) return 0; // test height
4736 sz = stbi__get8(s); // bits per pixel
4737 if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
4738 res = 0;
4739 else
4740 res = 1;
4741 stbi__rewind(s);
4742 return res;
4745 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4747 // read in the TGA header stuff
4748 int tga_offset = stbi__get8(s);
4749 int tga_indexed = stbi__get8(s);
4750 int tga_image_type = stbi__get8(s);
4751 int tga_is_RLE = 0;
4752 int tga_palette_start = stbi__get16le(s);
4753 int tga_palette_len = stbi__get16le(s);
4754 int tga_palette_bits = stbi__get8(s);
4755 int tga_x_origin = stbi__get16le(s);
4756 int tga_y_origin = stbi__get16le(s);
4757 int tga_width = stbi__get16le(s);
4758 int tga_height = stbi__get16le(s);
4759 int tga_bits_per_pixel = stbi__get8(s);
4760 int tga_comp = tga_bits_per_pixel / 8;
4761 int tga_inverted = stbi__get8(s);
4762 // image data
4763 unsigned char *tga_data;
4764 unsigned char *tga_palette = NULL;
4765 int i, j;
4766 unsigned char raw_data[4];
4767 int RLE_count = 0;
4768 int RLE_repeating = 0;
4769 int read_next_pixel = 1;
4771 // do a tiny bit of precessing
4772 if ( tga_image_type >= 8 )
4774 tga_image_type -= 8;
4775 tga_is_RLE = 1;
4777 /* int tga_alpha_bits = tga_inverted & 15; */
4778 tga_inverted = 1 - ((tga_inverted >> 5) & 1);
4780 // error check
4781 if ( //(tga_indexed) ||
4782 (tga_width < 1) || (tga_height < 1) ||
4783 (tga_image_type < 1) || (tga_image_type > 3) ||
4784 ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
4785 (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
4788 return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
4791 // If I'm paletted, then I'll use the number of bits from the palette
4792 if ( tga_indexed )
4794 tga_comp = tga_palette_bits / 8;
4797 // tga info
4798 *x = tga_width;
4799 *y = tga_height;
4800 if (comp) *comp = tga_comp;
4802 tga_data = (unsigned char*)stbi__malloc( tga_width * tga_height * tga_comp );
4803 if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
4805 // skip to the data's starting position (offset usually = 0)
4806 stbi__skip(s, tga_offset );
4808 if ( !tga_indexed && !tga_is_RLE) {
4809 for (i=0; i < tga_height; ++i) {
4810 int y = tga_inverted ? tga_height -i - 1 : i;
4811 stbi_uc *tga_row = tga_data + y*tga_width*tga_comp;
4812 stbi__getn(s, tga_row, tga_width * tga_comp);
4814 } else {
4815 // do I need to load a palette?
4816 if ( tga_indexed)
4818 // any data to skip? (offset usually = 0)
4819 stbi__skip(s, tga_palette_start );
4820 // load the palette
4821 tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
4822 if (!tga_palette) {
4823 STBI_FREE(tga_data);
4824 return stbi__errpuc("outofmem", "Out of memory");
4826 if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
4827 STBI_FREE(tga_data);
4828 STBI_FREE(tga_palette);
4829 return stbi__errpuc("bad palette", "Corrupt TGA");
4832 // load the data
4833 for (i=0; i < tga_width * tga_height; ++i)
4835 // if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
4836 if ( tga_is_RLE )
4838 if ( RLE_count == 0 )
4840 // yep, get the next byte as a RLE command
4841 int RLE_cmd = stbi__get8(s);
4842 RLE_count = 1 + (RLE_cmd & 127);
4843 RLE_repeating = RLE_cmd >> 7;
4844 read_next_pixel = 1;
4845 } else if ( !RLE_repeating )
4847 read_next_pixel = 1;
4849 } else
4851 read_next_pixel = 1;
4853 // OK, if I need to read a pixel, do it now
4854 if ( read_next_pixel )
4856 // load however much data we did have
4857 if ( tga_indexed )
4859 // read in 1 byte, then perform the lookup
4860 int pal_idx = stbi__get8(s);
4861 if ( pal_idx >= tga_palette_len )
4863 // invalid index
4864 pal_idx = 0;
4866 pal_idx *= tga_bits_per_pixel / 8;
4867 for (j = 0; j*8 < tga_bits_per_pixel; ++j)
4869 raw_data[j] = tga_palette[pal_idx+j];
4871 } else
4873 // read in the data raw
4874 for (j = 0; j*8 < tga_bits_per_pixel; ++j)
4876 raw_data[j] = stbi__get8(s);
4879 // clear the reading flag for the next pixel
4880 read_next_pixel = 0;
4881 } // end of reading a pixel
4883 // copy data
4884 for (j = 0; j < tga_comp; ++j)
4885 tga_data[i*tga_comp+j] = raw_data[j];
4887 // in case we're in RLE mode, keep counting down
4888 --RLE_count;
4890 // do I need to invert the image?
4891 if ( tga_inverted )
4893 for (j = 0; j*2 < tga_height; ++j)
4895 int index1 = j * tga_width * tga_comp;
4896 int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
4897 for (i = tga_width * tga_comp; i > 0; --i)
4899 unsigned char temp = tga_data[index1];
4900 tga_data[index1] = tga_data[index2];
4901 tga_data[index2] = temp;
4902 ++index1;
4903 ++index2;
4907 // clear my palette, if I had one
4908 if ( tga_palette != NULL )
4910 STBI_FREE( tga_palette );
4914 // swap RGB
4915 if (tga_comp >= 3)
4917 unsigned char* tga_pixel = tga_data;
4918 for (i=0; i < tga_width * tga_height; ++i)
4920 unsigned char temp = tga_pixel[0];
4921 tga_pixel[0] = tga_pixel[2];
4922 tga_pixel[2] = temp;
4923 tga_pixel += tga_comp;
4927 // convert to target component count
4928 if (req_comp && req_comp != tga_comp)
4929 tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
4931 // the things I do to get rid of an error message, and yet keep
4932 // Microsoft's C compilers happy... [8^(
4933 tga_palette_start = tga_palette_len = tga_palette_bits =
4934 tga_x_origin = tga_y_origin = 0;
4935 // OK, done
4936 return tga_data;
4938 #endif
4940 // *************************************************************************************************
4941 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
4943 #ifndef STBI_NO_PSD
4944 static int stbi__psd_test(stbi__context *s)
4946 int r = (stbi__get32be(s) == 0x38425053);
4947 stbi__rewind(s);
4948 return r;
4951 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4953 int pixelCount;
4954 int channelCount, compression;
4955 int channel, i, count, len;
4956 int w,h;
4957 stbi_uc *out;
4959 // Check identifier
4960 if (stbi__get32be(s) != 0x38425053) // "8BPS"
4961 return stbi__errpuc("not PSD", "Corrupt PSD image");
4963 // Check file type version.
4964 if (stbi__get16be(s) != 1)
4965 return stbi__errpuc("wrong version", "Unsupported version of PSD image");
4967 // Skip 6 reserved bytes.
4968 stbi__skip(s, 6 );
4970 // Read the number of channels (R, G, B, A, etc).
4971 channelCount = stbi__get16be(s);
4972 if (channelCount < 0 || channelCount > 16)
4973 return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
4975 // Read the rows and columns of the image.
4976 h = stbi__get32be(s);
4977 w = stbi__get32be(s);
4979 // Make sure the depth is 8 bits.
4980 if (stbi__get16be(s) != 8)
4981 return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 bit");
4983 // Make sure the color mode is RGB.
4984 // Valid options are:
4985 // 0: Bitmap
4986 // 1: Grayscale
4987 // 2: Indexed color
4988 // 3: RGB color
4989 // 4: CMYK color
4990 // 7: Multichannel
4991 // 8: Duotone
4992 // 9: Lab color
4993 if (stbi__get16be(s) != 3)
4994 return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
4996 // Skip the Mode Data. (It's the palette for indexed color; other info for other modes.)
4997 stbi__skip(s,stbi__get32be(s) );
4999 // Skip the image resources. (resolution, pen tool paths, etc)
5000 stbi__skip(s, stbi__get32be(s) );
5002 // Skip the reserved data.
5003 stbi__skip(s, stbi__get32be(s) );
5005 // Find out if the data is compressed.
5006 // Known values:
5007 // 0: no compression
5008 // 1: RLE compressed
5009 compression = stbi__get16be(s);
5010 if (compression > 1)
5011 return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5013 // Create the destination image.
5014 out = (stbi_uc *) stbi__malloc(4 * w*h);
5015 if (!out) return stbi__errpuc("outofmem", "Out of memory");
5016 pixelCount = w*h;
5018 // Initialize the data to zero.
5019 //memset( out, 0, pixelCount * 4 );
5021 // Finally, the image data.
5022 if (compression) {
5023 // RLE as used by .PSD and .TIFF
5024 // Loop until you get the number of unpacked bytes you are expecting:
5025 // Read the next source byte into n.
5026 // If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5027 // Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5028 // Else if n is 128, noop.
5029 // Endloop
5031 // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5032 // which we're going to just skip.
5033 stbi__skip(s, h * channelCount * 2 );
5035 // Read the RLE data by channel.
5036 for (channel = 0; channel < 4; channel++) {
5037 stbi_uc *p;
5039 p = out+channel;
5040 if (channel >= channelCount) {
5041 // Fill this channel with default data.
5042 for (i = 0; i < pixelCount; i++) *p = (channel == 3 ? 255 : 0), p += 4;
5043 } else {
5044 // Read the RLE data.
5045 count = 0;
5046 while (count < pixelCount) {
5047 len = stbi__get8(s);
5048 if (len == 128) {
5049 // No-op.
5050 } else if (len < 128) {
5051 // Copy next len+1 bytes literally.
5052 len++;
5053 count += len;
5054 while (len) {
5055 *p = stbi__get8(s);
5056 p += 4;
5057 len--;
5059 } else if (len > 128) {
5060 stbi_uc val;
5061 // Next -len+1 bytes in the dest are replicated from next source byte.
5062 // (Interpret len as a negative 8-bit int.)
5063 len ^= 0x0FF;
5064 len += 2;
5065 val = stbi__get8(s);
5066 count += len;
5067 while (len) {
5068 *p = val;
5069 p += 4;
5070 len--;
5077 } else {
5078 // We're at the raw image data. It's each channel in order (Red, Green, Blue, Alpha, ...)
5079 // where each channel consists of an 8-bit value for each pixel in the image.
5081 // Read the data by channel.
5082 for (channel = 0; channel < 4; channel++) {
5083 stbi_uc *p;
5085 p = out + channel;
5086 if (channel > channelCount) {
5087 // Fill this channel with default data.
5088 for (i = 0; i < pixelCount; i++) *p = channel == 3 ? 255 : 0, p += 4;
5089 } else {
5090 // Read the data.
5091 for (i = 0; i < pixelCount; i++)
5092 *p = stbi__get8(s), p += 4;
5097 if (req_comp && req_comp != 4) {
5098 out = stbi__convert_format(out, 4, req_comp, w, h);
5099 if (out == NULL) return out; // stbi__convert_format frees input on failure
5102 if (comp) *comp = channelCount;
5103 *y = h;
5104 *x = w;
5106 return out;
5108 #endif
5110 // *************************************************************************************************
5111 // Softimage PIC loader
5112 // by Tom Seddon
5114 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5115 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5117 #ifndef STBI_NO_PIC
5118 static int stbi__pic_is4(stbi__context *s,const char *str)
5120 int i;
5121 for (i=0; i<4; ++i)
5122 if (stbi__get8(s) != (stbi_uc)str[i])
5123 return 0;
5125 return 1;
5128 static int stbi__pic_test_core(stbi__context *s)
5130 int i;
5132 if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5133 return 0;
5135 for(i=0;i<84;++i)
5136 stbi__get8(s);
5138 if (!stbi__pic_is4(s,"PICT"))
5139 return 0;
5141 return 1;
5144 typedef struct
5146 stbi_uc size,type,channel;
5147 } stbi__pic_packet;
5149 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5151 int mask=0x80, i;
5153 for (i=0; i<4; ++i, mask>>=1) {
5154 if (channel & mask) {
5155 if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5156 dest[i]=stbi__get8(s);
5160 return dest;
5163 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5165 int mask=0x80,i;
5167 for (i=0;i<4; ++i, mask>>=1)
5168 if (channel&mask)
5169 dest[i]=src[i];
5172 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5174 int act_comp=0,num_packets=0,y,chained;
5175 stbi__pic_packet packets[10];
5177 // this will (should...) cater for even some bizarre stuff like having data
5178 // for the same channel in multiple packets.
5179 do {
5180 stbi__pic_packet *packet;
5182 if (num_packets==sizeof(packets)/sizeof(packets[0]))
5183 return stbi__errpuc("bad format","too many packets");
5185 packet = &packets[num_packets++];
5187 chained = stbi__get8(s);
5188 packet->size = stbi__get8(s);
5189 packet->type = stbi__get8(s);
5190 packet->channel = stbi__get8(s);
5192 act_comp |= packet->channel;
5194 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (reading packets)");
5195 if (packet->size != 8) return stbi__errpuc("bad format","packet isn't 8bpp");
5196 } while (chained);
5198 *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5200 for(y=0; y<height; ++y) {
5201 int packet_idx;
5203 for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5204 stbi__pic_packet *packet = &packets[packet_idx];
5205 stbi_uc *dest = result+y*width*4;
5207 switch (packet->type) {
5208 default:
5209 return stbi__errpuc("bad format","packet has bad compression type");
5211 case 0: {//uncompressed
5212 int x;
5214 for(x=0;x<width;++x, dest+=4)
5215 if (!stbi__readval(s,packet->channel,dest))
5216 return 0;
5217 break;
5220 case 1://Pure RLE
5222 int left=width, i;
5224 while (left>0) {
5225 stbi_uc count,value[4];
5227 count=stbi__get8(s);
5228 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (pure read count)");
5230 if (count > left)
5231 count = (stbi_uc) left;
5233 if (!stbi__readval(s,packet->channel,value)) return 0;
5235 for(i=0; i<count; ++i,dest+=4)
5236 stbi__copyval(packet->channel,dest,value);
5237 left -= count;
5240 break;
5242 case 2: {//Mixed RLE
5243 int left=width;
5244 while (left>0) {
5245 int count = stbi__get8(s), i;
5246 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (mixed read count)");
5248 if (count >= 128) { // Repeated
5249 stbi_uc value[4];
5250 int i;
5252 if (count==128)
5253 count = stbi__get16be(s);
5254 else
5255 count -= 127;
5256 if (count > left)
5257 return stbi__errpuc("bad file","scanline overrun");
5259 if (!stbi__readval(s,packet->channel,value))
5260 return 0;
5262 for(i=0;i<count;++i, dest += 4)
5263 stbi__copyval(packet->channel,dest,value);
5264 } else { // Raw
5265 ++count;
5266 if (count>left) return stbi__errpuc("bad file","scanline overrun");
5268 for(i=0;i<count;++i, dest+=4)
5269 if (!stbi__readval(s,packet->channel,dest))
5270 return 0;
5272 left-=count;
5274 break;
5280 return result;
5283 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5285 stbi_uc *result;
5286 int i, x,y;
5288 for (i=0; i<92; ++i)
5289 stbi__get8(s);
5291 x = stbi__get16be(s);
5292 y = stbi__get16be(s);
5293 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (pic header)");
5294 if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5296 stbi__get32be(s); //skip `ratio'
5297 stbi__get16be(s); //skip `fields'
5298 stbi__get16be(s); //skip `pad'
5300 // intermediate buffer is RGBA
5301 result = (stbi_uc *) stbi__malloc(x*y*4);
5302 memset(result, 0xff, x*y*4);
5304 if (!stbi__pic_load_core(s,x,y,comp, result)) {
5305 STBI_FREE(result);
5306 result=0;
5308 *px = x;
5309 *py = y;
5310 if (req_comp == 0) req_comp = *comp;
5311 result=stbi__convert_format(result,4,req_comp,x,y);
5313 return result;
5316 static int stbi__pic_test(stbi__context *s)
5318 int r = stbi__pic_test_core(s);
5319 stbi__rewind(s);
5320 return r;
5322 #endif
5324 // *************************************************************************************************
5325 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5327 #ifndef STBI_NO_GIF
5328 typedef struct
5330 stbi__int16 prefix;
5331 stbi_uc first;
5332 stbi_uc suffix;
5333 } stbi__gif_lzw;
5335 typedef struct
5337 int w,h;
5338 stbi_uc *out; // output buffer (always 4 components)
5339 int flags, bgindex, ratio, transparent, eflags;
5340 stbi_uc pal[256][4];
5341 stbi_uc lpal[256][4];
5342 stbi__gif_lzw codes[4096];
5343 stbi_uc *color_table;
5344 int parse, step;
5345 int lflags;
5346 int start_x, start_y;
5347 int max_x, max_y;
5348 int cur_x, cur_y;
5349 int line_size;
5350 } stbi__gif;
5352 static int stbi__gif_test_raw(stbi__context *s)
5354 int sz;
5355 if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5356 sz = stbi__get8(s);
5357 if (sz != '9' && sz != '7') return 0;
5358 if (stbi__get8(s) != 'a') return 0;
5359 return 1;
5362 static int stbi__gif_test(stbi__context *s)
5364 int r = stbi__gif_test_raw(s);
5365 stbi__rewind(s);
5366 return r;
5369 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5371 int i;
5372 for (i=0; i < num_entries; ++i) {
5373 pal[i][2] = stbi__get8(s);
5374 pal[i][1] = stbi__get8(s);
5375 pal[i][0] = stbi__get8(s);
5376 pal[i][3] = transp == i ? 0 : 255;
5380 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5382 stbi_uc version;
5383 if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5384 return stbi__err("not GIF", "Corrupt GIF");
5386 version = stbi__get8(s);
5387 if (version != '7' && version != '9') return stbi__err("not GIF", "Corrupt GIF");
5388 if (stbi__get8(s) != 'a') return stbi__err("not GIF", "Corrupt GIF");
5390 stbi__g_failure_reason = "";
5391 g->w = stbi__get16le(s);
5392 g->h = stbi__get16le(s);
5393 g->flags = stbi__get8(s);
5394 g->bgindex = stbi__get8(s);
5395 g->ratio = stbi__get8(s);
5396 g->transparent = -1;
5398 if (comp != 0) *comp = 4; // can't actually tell whether it's 3 or 4 until we parse the comments
5400 if (is_info) return 1;
5402 if (g->flags & 0x80)
5403 stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5405 return 1;
5408 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5410 stbi__gif g;
5411 if (!stbi__gif_header(s, &g, comp, 1)) {
5412 stbi__rewind( s );
5413 return 0;
5415 if (x) *x = g.w;
5416 if (y) *y = g.h;
5417 return 1;
5420 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5422 stbi_uc *p, *c;
5424 // recurse to decode the prefixes, since the linked-list is backwards,
5425 // and working backwards through an interleaved image would be nasty
5426 if (g->codes[code].prefix >= 0)
5427 stbi__out_gif_code(g, g->codes[code].prefix);
5429 if (g->cur_y >= g->max_y) return;
5431 p = &g->out[g->cur_x + g->cur_y];
5432 c = &g->color_table[g->codes[code].suffix * 4];
5434 if (c[3] >= 128) {
5435 p[0] = c[2];
5436 p[1] = c[1];
5437 p[2] = c[0];
5438 p[3] = c[3];
5440 g->cur_x += 4;
5442 if (g->cur_x >= g->max_x) {
5443 g->cur_x = g->start_x;
5444 g->cur_y += g->step;
5446 while (g->cur_y >= g->max_y && g->parse > 0) {
5447 g->step = (1 << g->parse) * g->line_size;
5448 g->cur_y = g->start_y + (g->step >> 1);
5449 --g->parse;
5454 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5456 stbi_uc lzw_cs;
5457 stbi__int32 len, code;
5458 stbi__uint32 first;
5459 stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5460 stbi__gif_lzw *p;
5462 lzw_cs = stbi__get8(s);
5463 clear = 1 << lzw_cs;
5464 first = 1;
5465 codesize = lzw_cs + 1;
5466 codemask = (1 << codesize) - 1;
5467 bits = 0;
5468 valid_bits = 0;
5469 for (code = 0; code < clear; code++) {
5470 g->codes[code].prefix = -1;
5471 g->codes[code].first = (stbi_uc) code;
5472 g->codes[code].suffix = (stbi_uc) code;
5475 // support no starting clear code
5476 avail = clear+2;
5477 oldcode = -1;
5479 len = 0;
5480 for(;;) {
5481 if (valid_bits < codesize) {
5482 if (len == 0) {
5483 len = stbi__get8(s); // start new block
5484 if (len == 0)
5485 return g->out;
5487 --len;
5488 bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5489 valid_bits += 8;
5490 } else {
5491 stbi__int32 code = bits & codemask;
5492 bits >>= codesize;
5493 valid_bits -= codesize;
5494 // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5495 if (code == clear) { // clear code
5496 codesize = lzw_cs + 1;
5497 codemask = (1 << codesize) - 1;
5498 avail = clear + 2;
5499 oldcode = -1;
5500 first = 0;
5501 } else if (code == clear + 1) { // end of stream code
5502 stbi__skip(s, len);
5503 while ((len = stbi__get8(s)) > 0)
5504 stbi__skip(s,len);
5505 return g->out;
5506 } else if (code <= avail) {
5507 if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5509 if (oldcode >= 0) {
5510 p = &g->codes[avail++];
5511 if (avail > 4096) return stbi__errpuc("too many codes", "Corrupt GIF");
5512 p->prefix = (stbi__int16) oldcode;
5513 p->first = g->codes[oldcode].first;
5514 p->suffix = (code == avail) ? p->first : g->codes[code].first;
5515 } else if (code == avail)
5516 return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5518 stbi__out_gif_code(g, (stbi__uint16) code);
5520 if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5521 codesize++;
5522 codemask = (1 << codesize) - 1;
5525 oldcode = code;
5526 } else {
5527 return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5533 static void stbi__fill_gif_background(stbi__gif *g)
5535 int i;
5536 stbi_uc *c = g->pal[g->bgindex];
5537 // @OPTIMIZE: write a dword at a time
5538 for (i = 0; i < g->w * g->h * 4; i += 4) {
5539 stbi_uc *p = &g->out[i];
5540 p[0] = c[2];
5541 p[1] = c[1];
5542 p[2] = c[0];
5543 p[3] = c[3];
5547 // this function is designed to support animated gifs, although stb_image doesn't support it
5548 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5550 int i;
5551 stbi_uc *old_out = 0;
5553 if (g->out == 0) {
5554 if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
5555 g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5556 if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
5557 stbi__fill_gif_background(g);
5558 } else {
5559 // animated-gif-only path
5560 if (((g->eflags & 0x1C) >> 2) == 3) {
5561 old_out = g->out;
5562 g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5563 if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
5564 memcpy(g->out, old_out, g->w*g->h*4);
5568 for (;;) {
5569 switch (stbi__get8(s)) {
5570 case 0x2C: /* Image Descriptor */
5572 stbi__int32 x, y, w, h;
5573 stbi_uc *o;
5575 x = stbi__get16le(s);
5576 y = stbi__get16le(s);
5577 w = stbi__get16le(s);
5578 h = stbi__get16le(s);
5579 if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5580 return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5582 g->line_size = g->w * 4;
5583 g->start_x = x * 4;
5584 g->start_y = y * g->line_size;
5585 g->max_x = g->start_x + w * 4;
5586 g->max_y = g->start_y + h * g->line_size;
5587 g->cur_x = g->start_x;
5588 g->cur_y = g->start_y;
5590 g->lflags = stbi__get8(s);
5592 if (g->lflags & 0x40) {
5593 g->step = 8 * g->line_size; // first interlaced spacing
5594 g->parse = 3;
5595 } else {
5596 g->step = g->line_size;
5597 g->parse = 0;
5600 if (g->lflags & 0x80) {
5601 stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5602 g->color_table = (stbi_uc *) g->lpal;
5603 } else if (g->flags & 0x80) {
5604 for (i=0; i < 256; ++i) // @OPTIMIZE: stbi__jpeg_reset only the previous transparent
5605 g->pal[i][3] = 255;
5606 if (g->transparent >= 0 && (g->eflags & 0x01))
5607 g->pal[g->transparent][3] = 0;
5608 g->color_table = (stbi_uc *) g->pal;
5609 } else
5610 return stbi__errpuc("missing color table", "Corrupt GIF");
5612 o = stbi__process_gif_raster(s, g);
5613 if (o == NULL) return NULL;
5615 if (req_comp && req_comp != 4)
5616 o = stbi__convert_format(o, 4, req_comp, g->w, g->h);
5617 return o;
5620 case 0x21: // Comment Extension.
5622 int len;
5623 if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
5624 len = stbi__get8(s);
5625 if (len == 4) {
5626 g->eflags = stbi__get8(s);
5627 stbi__get16le(s); // delay
5628 g->transparent = stbi__get8(s);
5629 } else {
5630 stbi__skip(s, len);
5631 break;
5634 while ((len = stbi__get8(s)) != 0)
5635 stbi__skip(s, len);
5636 break;
5639 case 0x3B: // gif stream termination code
5640 return (stbi_uc *) s; // using '1' causes warning on some compilers
5642 default:
5643 return stbi__errpuc("unknown code", "Corrupt GIF");
5648 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5650 stbi_uc *u = 0;
5651 stbi__gif g;
5652 memset(&g, 0, sizeof(g));
5654 u = stbi__gif_load_next(s, &g, comp, req_comp);
5655 if (u == (stbi_uc *) s) u = 0; // end of animated gif marker
5656 if (u) {
5657 *x = g.w;
5658 *y = g.h;
5661 return u;
5664 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
5666 return stbi__gif_info_raw(s,x,y,comp);
5668 #endif
5670 // *************************************************************************************************
5671 // Radiance RGBE HDR loader
5672 // originally by Nicolas Schulz
5673 #ifndef STBI_NO_HDR
5674 static int stbi__hdr_test_core(stbi__context *s)
5676 const char *signature = "#?RADIANCE\n";
5677 int i;
5678 for (i=0; signature[i]; ++i)
5679 if (stbi__get8(s) != signature[i])
5680 return 0;
5681 return 1;
5684 static int stbi__hdr_test(stbi__context* s)
5686 int r = stbi__hdr_test_core(s);
5687 stbi__rewind(s);
5688 return r;
5691 #define STBI__HDR_BUFLEN 1024
5692 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
5694 int len=0;
5695 char c = '\0';
5697 c = (char) stbi__get8(z);
5699 while (!stbi__at_eof(z) && c != '\n') {
5700 buffer[len++] = c;
5701 if (len == STBI__HDR_BUFLEN-1) {
5702 // flush to end of line
5703 while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
5705 break;
5707 c = (char) stbi__get8(z);
5710 buffer[len] = 0;
5711 return buffer;
5714 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
5716 if ( input[3] != 0 ) {
5717 float f1;
5718 // Exponent
5719 f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
5720 if (req_comp <= 2)
5721 output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
5722 else {
5723 output[0] = input[0] * f1;
5724 output[1] = input[1] * f1;
5725 output[2] = input[2] * f1;
5727 if (req_comp == 2) output[1] = 1;
5728 if (req_comp == 4) output[3] = 1;
5729 } else {
5730 switch (req_comp) {
5731 case 4: output[3] = 1; /* fallthrough */
5732 case 3: output[0] = output[1] = output[2] = 0;
5733 break;
5734 case 2: output[1] = 1; /* fallthrough */
5735 case 1: output[0] = 0;
5736 break;
5741 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5743 char buffer[STBI__HDR_BUFLEN];
5744 char *token;
5745 int valid = 0;
5746 int width, height;
5747 stbi_uc *scanline;
5748 float *hdr_data;
5749 int len;
5750 unsigned char count, value;
5751 int i, j, k, c1,c2, z;
5754 // Check identifier
5755 if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
5756 return stbi__errpf("not HDR", "Corrupt HDR image");
5758 // Parse header
5759 for(;;) {
5760 token = stbi__hdr_gettoken(s,buffer);
5761 if (token[0] == 0) break;
5762 if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5765 if (!valid) return stbi__errpf("unsupported format", "Unsupported HDR format");
5767 // Parse width and height
5768 // can't use sscanf() if we're not using stdio!
5769 token = stbi__hdr_gettoken(s,buffer);
5770 if (strncmp(token, "-Y ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5771 token += 3;
5772 height = (int) strtol(token, &token, 10);
5773 while (*token == ' ') ++token;
5774 if (strncmp(token, "+X ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5775 token += 3;
5776 width = (int) strtol(token, NULL, 10);
5778 *x = width;
5779 *y = height;
5781 if (comp) *comp = 3;
5782 if (req_comp == 0) req_comp = 3;
5784 // Read data
5785 hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
5787 // Load image data
5788 // image data is stored as some number of sca
5789 if ( width < 8 || width >= 32768) {
5790 // Read flat data
5791 for (j=0; j < height; ++j) {
5792 for (i=0; i < width; ++i) {
5793 stbi_uc rgbe[4];
5794 main_decode_loop:
5795 stbi__getn(s, rgbe, 4);
5796 stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
5799 } else {
5800 // Read RLE-encoded data
5801 scanline = NULL;
5803 for (j = 0; j < height; ++j) {
5804 c1 = stbi__get8(s);
5805 c2 = stbi__get8(s);
5806 len = stbi__get8(s);
5807 if (c1 != 2 || c2 != 2 || (len & 0x80)) {
5808 // not run-length encoded, so we have to actually use THIS data as a decoded
5809 // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
5810 stbi_uc rgbe[4];
5811 rgbe[0] = (stbi_uc) c1;
5812 rgbe[1] = (stbi_uc) c2;
5813 rgbe[2] = (stbi_uc) len;
5814 rgbe[3] = (stbi_uc) stbi__get8(s);
5815 stbi__hdr_convert(hdr_data, rgbe, req_comp);
5816 i = 1;
5817 j = 0;
5818 STBI_FREE(scanline);
5819 goto main_decode_loop; // yes, this makes no sense
5821 len <<= 8;
5822 len |= stbi__get8(s);
5823 if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
5824 if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
5826 for (k = 0; k < 4; ++k) {
5827 i = 0;
5828 while (i < width) {
5829 count = stbi__get8(s);
5830 if (count > 128) {
5831 // Run
5832 value = stbi__get8(s);
5833 count -= 128;
5834 for (z = 0; z < count; ++z)
5835 scanline[i++ * 4 + k] = value;
5836 } else {
5837 // Dump
5838 for (z = 0; z < count; ++z)
5839 scanline[i++ * 4 + k] = stbi__get8(s);
5843 for (i=0; i < width; ++i)
5844 stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
5846 STBI_FREE(scanline);
5849 return hdr_data;
5852 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
5854 char buffer[STBI__HDR_BUFLEN];
5855 char *token;
5856 int valid = 0;
5858 if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
5859 stbi__rewind( s );
5860 return 0;
5863 for(;;) {
5864 token = stbi__hdr_gettoken(s,buffer);
5865 if (token[0] == 0) break;
5866 if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5869 if (!valid) {
5870 stbi__rewind( s );
5871 return 0;
5873 token = stbi__hdr_gettoken(s,buffer);
5874 if (strncmp(token, "-Y ", 3)) {
5875 stbi__rewind( s );
5876 return 0;
5878 token += 3;
5879 *y = (int) strtol(token, &token, 10);
5880 while (*token == ' ') ++token;
5881 if (strncmp(token, "+X ", 3)) {
5882 stbi__rewind( s );
5883 return 0;
5885 token += 3;
5886 *x = (int) strtol(token, NULL, 10);
5887 *comp = 3;
5888 return 1;
5890 #endif // STBI_NO_HDR
5892 #ifndef STBI_NO_BMP
5893 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
5895 int hsz;
5896 if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
5897 stbi__rewind( s );
5898 return 0;
5900 stbi__skip(s,12);
5901 hsz = stbi__get32le(s);
5902 if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
5903 stbi__rewind( s );
5904 return 0;
5906 if (hsz == 12) {
5907 *x = stbi__get16le(s);
5908 *y = stbi__get16le(s);
5909 } else {
5910 *x = stbi__get32le(s);
5911 *y = stbi__get32le(s);
5913 if (stbi__get16le(s) != 1) {
5914 stbi__rewind( s );
5915 return 0;
5917 *comp = stbi__get16le(s) / 8;
5918 return 1;
5920 #endif
5922 #ifndef STBI_NO_PSD
5923 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
5925 int channelCount;
5926 if (stbi__get32be(s) != 0x38425053) {
5927 stbi__rewind( s );
5928 return 0;
5930 if (stbi__get16be(s) != 1) {
5931 stbi__rewind( s );
5932 return 0;
5934 stbi__skip(s, 6);
5935 channelCount = stbi__get16be(s);
5936 if (channelCount < 0 || channelCount > 16) {
5937 stbi__rewind( s );
5938 return 0;
5940 *y = stbi__get32be(s);
5941 *x = stbi__get32be(s);
5942 if (stbi__get16be(s) != 8) {
5943 stbi__rewind( s );
5944 return 0;
5946 if (stbi__get16be(s) != 3) {
5947 stbi__rewind( s );
5948 return 0;
5950 *comp = 4;
5951 return 1;
5953 #endif
5955 #ifndef STBI_NO_PIC
5956 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
5958 int act_comp=0,num_packets=0,chained;
5959 stbi__pic_packet packets[10];
5961 stbi__skip(s, 92);
5963 *x = stbi__get16be(s);
5964 *y = stbi__get16be(s);
5965 if (stbi__at_eof(s)) return 0;
5966 if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
5967 stbi__rewind( s );
5968 return 0;
5971 stbi__skip(s, 8);
5973 do {
5974 stbi__pic_packet *packet;
5976 if (num_packets==sizeof(packets)/sizeof(packets[0]))
5977 return 0;
5979 packet = &packets[num_packets++];
5980 chained = stbi__get8(s);
5981 packet->size = stbi__get8(s);
5982 packet->type = stbi__get8(s);
5983 packet->channel = stbi__get8(s);
5984 act_comp |= packet->channel;
5986 if (stbi__at_eof(s)) {
5987 stbi__rewind( s );
5988 return 0;
5990 if (packet->size != 8) {
5991 stbi__rewind( s );
5992 return 0;
5994 } while (chained);
5996 *comp = (act_comp & 0x10 ? 4 : 3);
5998 return 1;
6000 #endif
6002 // *************************************************************************************************
6003 // Portable Gray Map and Portable Pixel Map loader
6004 // by Ken Miller
6006 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6007 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6009 // Known limitations:
6010 // Does not support comments in the header section
6011 // Does not support ASCII image data (formats P2 and P3)
6012 // Does not support 16-bit-per-channel
6014 #ifndef STBI_NO_PNM
6016 static int stbi__pnm_test(stbi__context *s)
6018 char p, t;
6019 p = (char) stbi__get8(s);
6020 t = (char) stbi__get8(s);
6021 if (p != 'P' || (t != '5' && t != '6')) {
6022 stbi__rewind( s );
6023 return 0;
6025 return 1;
6028 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6030 stbi_uc *out;
6031 if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6032 return 0;
6033 *x = s->img_x;
6034 *y = s->img_y;
6035 *comp = s->img_n;
6037 out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6038 if (!out) return stbi__errpuc("outofmem", "Out of memory");
6039 stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6041 if (req_comp && req_comp != s->img_n) {
6042 out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6043 if (out == NULL) return out; // stbi__convert_format frees input on failure
6045 return out;
6048 static int stbi__pnm_isspace(char c)
6050 return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6053 static void stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6055 while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6056 *c = (char) stbi__get8(s);
6059 static int stbi__pnm_isdigit(char c)
6061 return c >= '0' && c <= '9';
6064 static int stbi__pnm_getinteger(stbi__context *s, char *c)
6066 int value = 0;
6068 while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6069 value = value*10 + (*c - '0');
6070 *c = (char) stbi__get8(s);
6073 return value;
6076 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6078 int maxv;
6079 char c, p, t;
6081 stbi__rewind( s );
6083 // Get identifier
6084 p = (char) stbi__get8(s);
6085 t = (char) stbi__get8(s);
6086 if (p != 'P' || (t != '5' && t != '6')) {
6087 stbi__rewind( s );
6088 return 0;
6091 *comp = (t == '6') ? 3 : 1; // '5' is 1-component .pgm; '6' is 3-component .ppm
6093 c = (char) stbi__get8(s);
6094 stbi__pnm_skip_whitespace(s, &c);
6096 *x = stbi__pnm_getinteger(s, &c); // read width
6097 stbi__pnm_skip_whitespace(s, &c);
6099 *y = stbi__pnm_getinteger(s, &c); // read height
6100 stbi__pnm_skip_whitespace(s, &c);
6102 maxv = stbi__pnm_getinteger(s, &c); // read max value
6104 if (maxv > 255)
6105 return stbi__err("max value > 255", "PPM image not 8-bit");
6106 else
6107 return 1;
6109 #endif
6111 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6113 #ifndef STBI_NO_JPEG
6114 if (stbi__jpeg_info(s, x, y, comp)) return 1;
6115 #endif
6117 #ifndef STBI_NO_PNG
6118 if (stbi__png_info(s, x, y, comp)) return 1;
6119 #endif
6121 #ifndef STBI_NO_GIF
6122 if (stbi__gif_info(s, x, y, comp)) return 1;
6123 #endif
6125 #ifndef STBI_NO_BMP
6126 if (stbi__bmp_info(s, x, y, comp)) return 1;
6127 #endif
6129 #ifndef STBI_NO_PSD
6130 if (stbi__psd_info(s, x, y, comp)) return 1;
6131 #endif
6133 #ifndef STBI_NO_PIC
6134 if (stbi__pic_info(s, x, y, comp)) return 1;
6135 #endif
6137 #ifndef STBI_NO_PNM
6138 if (stbi__pnm_info(s, x, y, comp)) return 1;
6139 #endif
6141 #ifndef STBI_NO_HDR
6142 if (stbi__hdr_info(s, x, y, comp)) return 1;
6143 #endif
6145 // test tga last because it's a crappy test!
6146 #ifndef STBI_NO_TGA
6147 if (stbi__tga_info(s, x, y, comp))
6148 return 1;
6149 #endif
6150 return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6153 #ifndef STBI_NO_STDIO
6154 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6156 FILE *f = stbi__fopen(filename, "rb");
6157 int result;
6158 if (!f) return stbi__err("can't fopen", "Unable to open file");
6159 result = stbi_info_from_file(f, x, y, comp);
6160 fclose(f);
6161 return result;
6164 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6166 int r;
6167 stbi__context s;
6168 long pos = ftell(f);
6169 stbi__start_file(&s, f);
6170 r = stbi__info_main(&s,x,y,comp);
6171 fseek(f,pos,SEEK_SET);
6172 return r;
6174 #endif // !STBI_NO_STDIO
6176 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6178 stbi__context s;
6179 stbi__start_mem(&s,buffer,len);
6180 return stbi__info_main(&s,x,y,comp);
6183 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6185 stbi__context s;
6186 stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6187 return stbi__info_main(&s,x,y,comp);
6190 #endif // STB_IMAGE_IMPLEMENTATION
6193 revision history:
6194 2.02 (2015-01-19) fix incorrect assert, fix warning
6195 2.01 (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6196 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6197 2.00 (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6198 progressive JPEG (stb)
6199 PGM/PPM support (Ken Miller)
6200 STBI_MALLOC,STBI_REALLOC,STBI_FREE
6201 GIF bugfix -- seemingly never worked
6202 STBI_NO_*, STBI_ONLY_*
6203 1.48 (2014-12-14) fix incorrectly-named assert()
6204 1.47 (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6205 optimize PNG (ryg)
6206 fix bug in interlaced PNG with user-specified channel count (stb)
6207 1.46 (2014-08-26)
6208 fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6209 1.45 (2014-08-16)
6210 fix MSVC-ARM internal compiler error by wrapping malloc
6211 1.44 (2014-08-07)
6212 various warning fixes from Ronny Chevalier
6213 1.43 (2014-07-15)
6214 fix MSVC-only compiler problem in code changed in 1.42
6215 1.42 (2014-07-09)
6216 don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6217 fixes to stbi__cleanup_jpeg path
6218 added STBI_ASSERT to avoid requiring assert.h
6219 1.41 (2014-06-25)
6220 fix search&replace from 1.36 that messed up comments/error messages
6221 1.40 (2014-06-22)
6222 fix gcc struct-initialization warning
6223 1.39 (2014-06-15)
6224 fix to TGA optimization when req_comp != number of components in TGA;
6225 fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6226 add support for BMP version 5 (more ignored fields)
6227 1.38 (2014-06-06)
6228 suppress MSVC warnings on integer casts truncating values
6229 fix accidental rename of 'skip' field of I/O
6230 1.37 (2014-06-04)
6231 remove duplicate typedef
6232 1.36 (2014-06-03)
6233 convert to header file single-file library
6234 if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6235 1.35 (2014-05-27)
6236 various warnings
6237 fix broken STBI_SIMD path
6238 fix bug where stbi_load_from_file no longer left file pointer in correct place
6239 fix broken non-easy path for 32-bit BMP (possibly never used)
6240 TGA optimization by Arseny Kapoulkine
6241 1.34 (unknown)
6242 use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6243 1.33 (2011-07-14)
6244 make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6245 1.32 (2011-07-13)
6246 support for "info" function for all supported filetypes (SpartanJ)
6247 1.31 (2011-06-20)
6248 a few more leak fixes, bug in PNG handling (SpartanJ)
6249 1.30 (2011-06-11)
6250 added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
6251 removed deprecated format-specific test/load functions
6252 removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6253 error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6254 fix inefficiency in decoding 32-bit BMP (David Woo)
6255 1.29 (2010-08-16)
6256 various warning fixes from Aurelien Pocheville
6257 1.28 (2010-08-01)
6258 fix bug in GIF palette transparency (SpartanJ)
6259 1.27 (2010-08-01)
6260 cast-to-stbi_uc to fix warnings
6261 1.26 (2010-07-24)
6262 fix bug in file buffering for PNG reported by SpartanJ
6263 1.25 (2010-07-17)
6264 refix trans_data warning (Won Chun)
6265 1.24 (2010-07-12)
6266 perf improvements reading from files on platforms with lock-heavy fgetc()
6267 minor perf improvements for jpeg
6268 deprecated type-specific functions so we'll get feedback if they're needed
6269 attempt to fix trans_data warning (Won Chun)
6270 1.23 fixed bug in iPhone support
6271 1.22 (2010-07-10)
6272 removed image *writing* support
6273 stbi_info support from Jetro Lauha
6274 GIF support from Jean-Marc Lienher
6275 iPhone PNG-extensions from James Brown
6276 warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6277 1.21 fix use of 'stbi_uc' in header (reported by jon blow)
6278 1.20 added support for Softimage PIC, by Tom Seddon
6279 1.19 bug in interlaced PNG corruption check (found by ryg)
6280 1.18 2008-08-02
6281 fix a threading bug (local mutable static)
6282 1.17 support interlaced PNG
6283 1.16 major bugfix - stbi__convert_format converted one too many pixels
6284 1.15 initialize some fields for thread safety
6285 1.14 fix threadsafe conversion bug
6286 header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6287 1.13 threadsafe
6288 1.12 const qualifiers in the API
6289 1.11 Support installable IDCT, colorspace conversion routines
6290 1.10 Fixes for 64-bit (don't use "unsigned long")
6291 optimized upsampling by Fabian "ryg" Giesen
6292 1.09 Fix format-conversion for PSD code (bad global variables!)
6293 1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6294 1.07 attempt to fix C++ warning/errors again
6295 1.06 attempt to fix C++ warning/errors again
6296 1.05 fix TGA loading to return correct *comp and use good luminance calc
6297 1.04 default float alpha is 1, not 255; use 'void *' for stbi_image_free
6298 1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6299 1.02 support for (subset of) HDR files, float interface for preferred access to them
6300 1.01 fix bug: possible bug in handling right-side up bmps... not sure
6301 fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6302 1.00 interface to zlib that skips zlib header
6303 0.99 correct handling of alpha in palette
6304 0.98 TGA loader by lonesock; dynamically add loaders (untested)
6305 0.97 jpeg errors on too large a file; also catch another malloc failure
6306 0.96 fix detection of invalid v value - particleman@mollyrocket forum
6307 0.95 during header scan, seek to markers in case of padding
6308 0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6309 0.93 handle jpegtran output; verbose errors
6310 0.92 read 4,8,16,24,32-bit BMP files of several formats
6311 0.91 output 24-bit Windows 3.0 BMP files
6312 0.90 fix a few more warnings; bump version number to approach 1.0
6313 0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd
6314 0.60 fix compiling as c++
6315 0.59 fix warnings: merge Dave Moore's -Wall fixes
6316 0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian
6317 0.57 fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6318 0.56 fix bug: zlib uncompressed mode len vs. nlen
6319 0.55 fix bug: restart_interval not initialized to 0
6320 0.54 allow NULL for 'int *comp'
6321 0.53 fix bug in png 3->4; speedup png decoding
6322 0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6323 0.51 obey req_comp requests, 1-component jpegs return as 1-component,
6324 on 'test' only check type, not whether we support this variant
6325 0.50 first released version