hs

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs | Submodules | LICENSE

stb_image.h (278901B)


      1 /* stb_image - v2.27 - public domain image loader - http://nothings.org/stb
      2                                   no warranty implied; use at your own risk
      3 
      4    Do this:
      5       #define STB_IMAGE_IMPLEMENTATION
      6    before you include this file in *one* C or C++ file to create the implementation.
      7 
      8    // i.e. it should look like this:
      9    #include ...
     10    #include ...
     11    #include ...
     12    #define STB_IMAGE_IMPLEMENTATION
     13    #include "stb_image.h"
     14 
     15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
     16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
     17 
     18 
     19    QUICK NOTES:
     20       Primarily of interest to game developers and other people who can
     21           avoid problematic images and only need the trivial interface
     22 
     23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
     24       PNG 1/2/4/8/16-bit-per-channel
     25 
     26       TGA (not sure what subset, if a subset)
     27       BMP non-1bpp, non-RLE
     28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
     29 
     30       GIF (*comp always reports as 4-channel)
     31       HDR (radiance rgbE format)
     32       PIC (Softimage PIC)
     33       PNM (PPM and PGM binary only)
     34 
     35       Animated GIF still needs a proper API, but here's one way to do it:
     36           http://gist.github.com/urraka/685d9a6340b26b830d49
     37 
     38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
     39       - decode from arbitrary I/O callbacks
     40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
     41 
     42    Full documentation under "DOCUMENTATION" below.
     43 
     44 
     45 LICENSE
     46 
     47   See end of file for license information.
     48 
     49 RECENT REVISION HISTORY:
     50 
     51       2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
     52       2.26  (2020-07-13) many minor fixes
     53       2.25  (2020-02-02) fix warnings
     54       2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
     55       2.23  (2019-08-11) fix clang static analysis warning
     56       2.22  (2019-03-04) gif fixes, fix warnings
     57       2.21  (2019-02-25) fix typo in comment
     58       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
     59       2.19  (2018-02-11) fix warning
     60       2.18  (2018-01-30) fix warnings
     61       2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
     62       2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
     63       2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
     64       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
     65       2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
     66       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
     67       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
     68                          RGB-format JPEG; remove white matting in PSD;
     69                          allocate large structures on the stack;
     70                          correct channel count for PNG & BMP
     71       2.10  (2016-01-22) avoid warning introduced in 2.09
     72       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
     73 
     74    See end of file for full revision history.
     75 
     76 
     77  ============================    Contributors    =========================
     78 
     79  Image formats                          Extensions, features
     80     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
     81     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
     82     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
     83     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
     84     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
     85     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
     86     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
     87     github:urraka (animated gif)           Junggon Kim (PNM comments)
     88     Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
     89                                            socks-the-fox (16-bit PNG)
     90                                            Jeremy Sawicki (handle all ImageNet JPGs)
     91  Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
     92     Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
     93     Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
     94     John-Mark Allen
     95     Carmelo J Fdez-Aguera
     96 
     97  Bug & warning fixes
     98     Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
     99     Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
    100     Phil Jordan                                Dave Moore           Roy Eltham
    101     Hayaki Saito            Nathan Reed        Won Chun
    102     Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
    103     Thomas Ruf              Ronny Chevalier                         github:rlyeh
    104     Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
    105     Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
    106     Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
    107     Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
    108     Cass Everitt            Ryamond Barbiero                        github:grim210
    109     Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
    110     Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
    111     Josh Tobin                                 Matthew Gregan       github:poppolopoppo
    112     Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
    113     Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
    114                             Brad Weinberger    Matvey Cherevko      github:mosra
    115     Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
    116     Ryan C. Gordon          [reserved]                              [reserved]
    117                      DO NOT ADD YOUR NAME HERE
    118 
    119                      Jacko Dirks
    120 
    121   To add your name to the credits, pick a random blank space in the middle and fill it.
    122   80% of merge conflicts on stb PRs are due to people adding their name at the end
    123   of the credits.
    124 */
    125 
    126 #ifndef STBI_INCLUDE_STB_IMAGE_H
    127 #define STBI_INCLUDE_STB_IMAGE_H
    128 
    129 // DOCUMENTATION
    130 //
    131 // Limitations:
    132 //    - no 12-bit-per-channel JPEG
    133 //    - no JPEGs with arithmetic coding
    134 //    - GIF always returns *comp=4
    135 //
    136 // Basic usage (see HDR discussion below for HDR usage):
    137 //    int x,y,n;
    138 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
    139 //    // ... process data if not NULL ...
    140 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
    141 //    // ... replace '0' with '1'..'4' to force that many components per pixel
    142 //    // ... but 'n' will always be the number that it would have been if you said 0
    143 //    stbi_image_free(data)
    144 //
    145 // Standard parameters:
    146 //    int *x                 -- outputs image width in pixels
    147 //    int *y                 -- outputs image height in pixels
    148 //    int *channels_in_file  -- outputs # of image components in image file
    149 //    int desired_channels   -- if non-zero, # of image components requested in result
    150 //
    151 // The return value from an image loader is an 'unsigned char *' which points
    152 // to the pixel data, or NULL on an allocation failure or if the image is
    153 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
    154 // with each pixel consisting of N interleaved 8-bit components; the first
    155 // pixel pointed to is top-left-most in the image. There is no padding between
    156 // image scanlines or between pixels, regardless of format. The number of
    157 // components N is 'desired_channels' if desired_channels is non-zero, or
    158 // *channels_in_file otherwise. If desired_channels is non-zero,
    159 // *channels_in_file has the number of components that _would_ have been
    160 // output otherwise. E.g. if you set desired_channels to 4, you will always
    161 // get RGBA output, but you can check *channels_in_file to see if it's trivially
    162 // opaque because e.g. there were only 3 channels in the source image.
    163 //
    164 // An output image with N components has the following components interleaved
    165 // in this order in each pixel:
    166 //
    167 //     N=#comp     components
    168 //       1           grey
    169 //       2           grey, alpha
    170 //       3           red, green, blue
    171 //       4           red, green, blue, alpha
    172 //
    173 // If image loading fails for any reason, the return value will be NULL,
    174 // and *x, *y, *channels_in_file will be unchanged. The function
    175 // stbi_failure_reason() can be queried for an extremely brief, end-user
    176 // unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
    177 // to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
    178 // more user-friendly ones.
    179 //
    180 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
    181 //
    182 // To query the width, height and component count of an image without having to
    183 // decode the full file, you can use the stbi_info family of functions:
    184 //
    185 //   int x,y,n,ok;
    186 //   ok = stbi_info(filename, &x, &y, &n);
    187 //   // returns ok=1 and sets x, y, n if image is a supported format,
    188 //   // 0 otherwise.
    189 //
    190 // Note that stb_image pervasively uses ints in its public API for sizes,
    191 // including sizes of memory buffers. This is now part of the API and thus
    192 // hard to change without causing breakage. As a result, the various image
    193 // loaders all have certain limits on image size; these differ somewhat
    194 // by format but generally boil down to either just under 2GB or just under
    195 // 1GB. When the decoded image would be larger than this, stb_image decoding
    196 // will fail.
    197 //
    198 // Additionally, stb_image will reject image files that have any of their
    199 // dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
    200 // which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
    201 // the only way to have an image with such dimensions load correctly
    202 // is for it to have a rather extreme aspect ratio. Either way, the
    203 // assumption here is that such larger images are likely to be malformed
    204 // or malicious. If you do need to load an image with individual dimensions
    205 // larger than that, and it still fits in the overall size limit, you can
    206 // #define STBI_MAX_DIMENSIONS on your own to be something larger.
    207 //
    208 // ===========================================================================
    209 //
    210 // UNICODE:
    211 //
    212 //   If compiling for Windows and you wish to use Unicode filenames, compile
    213 //   with
    214 //       #define STBI_WINDOWS_UTF8
    215 //   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
    216 //   Windows wchar_t filenames to utf8.
    217 //
    218 // ===========================================================================
    219 //
    220 // Philosophy
    221 //
    222 // stb libraries are designed with the following priorities:
    223 //
    224 //    1. easy to use
    225 //    2. easy to maintain
    226 //    3. good performance
    227 //
    228 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
    229 // and for best performance I may provide less-easy-to-use APIs that give higher
    230 // performance, in addition to the easy-to-use ones. Nevertheless, it's important
    231 // to keep in mind that from the standpoint of you, a client of this library,
    232 // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
    233 //
    234 // Some secondary priorities arise directly from the first two, some of which
    235 // provide more explicit reasons why performance can't be emphasized.
    236 //
    237 //    - Portable ("ease of use")
    238 //    - Small source code footprint ("easy to maintain")
    239 //    - No dependencies ("ease of use")
    240 //
    241 // ===========================================================================
    242 //
    243 // I/O callbacks
    244 //
    245 // I/O callbacks allow you to read from arbitrary sources, like packaged
    246 // files or some other source. Data read from callbacks are processed
    247 // through a small internal buffer (currently 128 bytes) to try to reduce
    248 // overhead.
    249 //
    250 // The three functions you must define are "read" (reads some bytes of data),
    251 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
    252 //
    253 // ===========================================================================
    254 //
    255 // SIMD support
    256 //
    257 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
    258 // supported by the compiler. For ARM Neon support, you must explicitly
    259 // request it.
    260 //
    261 // (The old do-it-yourself SIMD API is no longer supported in the current
    262 // code.)
    263 //
    264 // On x86, SSE2 will automatically be used when available based on a run-time
    265 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
    266 // the typical path is to have separate builds for NEON and non-NEON devices
    267 // (at least this is true for iOS and Android). Therefore, the NEON support is
    268 // toggled by a build flag: define STBI_NEON to get NEON loops.
    269 //
    270 // If for some reason you do not want to use any of SIMD code, or if
    271 // you have issues compiling it, you can disable it entirely by
    272 // defining STBI_NO_SIMD.
    273 //
    274 // ===========================================================================
    275 //
    276 // HDR image support   (disable by defining STBI_NO_HDR)
    277 //
    278 // stb_image supports loading HDR images in general, and currently the Radiance
    279 // .HDR file format specifically. You can still load any file through the existing
    280 // interface; if you attempt to load an HDR file, it will be automatically remapped
    281 // to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
    282 // both of these constants can be reconfigured through this interface:
    283 //
    284 //     stbi_hdr_to_ldr_gamma(2.2f);
    285 //     stbi_hdr_to_ldr_scale(1.0f);
    286 //
    287 // (note, do not use _inverse_ constants; stbi_image will invert them
    288 // appropriately).
    289 //
    290 // Additionally, there is a new, parallel interface for loading files as
    291 // (linear) floats to preserve the full dynamic range:
    292 //
    293 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
    294 //
    295 // If you load LDR images through this interface, those images will
    296 // be promoted to floating point values, run through the inverse of
    297 // constants corresponding to the above:
    298 //
    299 //     stbi_ldr_to_hdr_scale(1.0f);
    300 //     stbi_ldr_to_hdr_gamma(2.2f);
    301 //
    302 // Finally, given a filename (or an open file or memory block--see header
    303 // file for details) containing image data, you can query for the "most
    304 // appropriate" interface to use (that is, whether the image is HDR or
    305 // not), using:
    306 //
    307 //     stbi_is_hdr(char *filename);
    308 //
    309 // ===========================================================================
    310 //
    311 // iPhone PNG support:
    312 //
    313 // We optionally support converting iPhone-formatted PNGs (which store
    314 // premultiplied BGRA) back to RGB, even though they're internally encoded
    315 // differently. To enable this conversion, call
    316 // stbi_convert_iphone_png_to_rgb(1).
    317 //
    318 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
    319 // pixel to remove any premultiplied alpha *only* if the image file explicitly
    320 // says there's premultiplied data (currently only happens in iPhone images,
    321 // and only if iPhone convert-to-rgb processing is on).
    322 //
    323 // ===========================================================================
    324 //
    325 // ADDITIONAL CONFIGURATION
    326 //
    327 //  - You can suppress implementation of any of the decoders to reduce
    328 //    your code footprint by #defining one or more of the following
    329 //    symbols before creating the implementation.
    330 //
    331 //        STBI_NO_JPEG
    332 //        STBI_NO_PNG
    333 //        STBI_NO_BMP
    334 //        STBI_NO_PSD
    335 //        STBI_NO_TGA
    336 //        STBI_NO_GIF
    337 //        STBI_NO_HDR
    338 //        STBI_NO_PIC
    339 //        STBI_NO_PNM   (.ppm and .pgm)
    340 //
    341 //  - You can request *only* certain decoders and suppress all other ones
    342 //    (this will be more forward-compatible, as addition of new decoders
    343 //    doesn't require you to disable them explicitly):
    344 //
    345 //        STBI_ONLY_JPEG
    346 //        STBI_ONLY_PNG
    347 //        STBI_ONLY_BMP
    348 //        STBI_ONLY_PSD
    349 //        STBI_ONLY_TGA
    350 //        STBI_ONLY_GIF
    351 //        STBI_ONLY_HDR
    352 //        STBI_ONLY_PIC
    353 //        STBI_ONLY_PNM   (.ppm and .pgm)
    354 //
    355 //   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
    356 //     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
    357 //
    358 //  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
    359 //    than that size (in either width or height) without further processing.
    360 //    This is to let programs in the wild set an upper bound to prevent
    361 //    denial-of-service attacks on untrusted data, as one could generate a
    362 //    valid image of gigantic dimensions and force stb_image to allocate a
    363 //    huge block of memory and spend disproportionate time decoding it. By
    364 //    default this is set to (1 << 24), which is 16777216, but that's still
    365 //    very big.
    366 
    367 #ifndef STBI_NO_STDIO
    368 #include <stdio.h>
    369 #endif // STBI_NO_STDIO
    370 
    371 #define STBI_VERSION 1
    372 
    373 enum
    374 {
    375    STBI_default = 0, // only used for desired_channels
    376 
    377    STBI_grey       = 1,
    378    STBI_grey_alpha = 2,
    379    STBI_rgb        = 3,
    380    STBI_rgb_alpha  = 4
    381 };
    382 
    383 #include <stdlib.h>
    384 typedef unsigned char stbi_uc;
    385 typedef unsigned short stbi_us;
    386 
    387 #ifdef __cplusplus
    388 extern "C" {
    389 #endif
    390 
    391 #ifndef STBIDEF
    392 #ifdef STB_IMAGE_STATIC
    393 #define STBIDEF static
    394 #else
    395 #define STBIDEF extern
    396 #endif
    397 #endif
    398 
    399 //////////////////////////////////////////////////////////////////////////////
    400 //
    401 // PRIMARY API - works on images of any type
    402 //
    403 
    404 //
    405 // load image by filename, open file, or memory buffer
    406 //
    407 
    408 typedef struct
    409 {
    410    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
    411    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
    412    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
    413 } stbi_io_callbacks;
    414 
    415 ////////////////////////////////////
    416 //
    417 // 8-bits-per-channel interface
    418 //
    419 
    420 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
    421 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    422 
    423 #ifndef STBI_NO_STDIO
    424 STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    425 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    426 // for stbi_load_from_file, file pointer is left pointing immediately after image
    427 #endif
    428 
    429 #ifndef STBI_NO_GIF
    430 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    431 #endif
    432 
    433 #ifdef STBI_WINDOWS_UTF8
    434 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
    435 #endif
    436 
    437 ////////////////////////////////////
    438 //
    439 // 16-bits-per-channel interface
    440 //
    441 
    442 STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    443 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    444 
    445 #ifndef STBI_NO_STDIO
    446 STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    447 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    448 #endif
    449 
    450 ////////////////////////////////////
    451 //
    452 // float-per-channel interface
    453 //
    454 #ifndef STBI_NO_LINEAR
    455    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    456    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
    457 
    458    #ifndef STBI_NO_STDIO
    459    STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    460    STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    461    #endif
    462 #endif
    463 
    464 #ifndef STBI_NO_HDR
    465    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
    466    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
    467 #endif // STBI_NO_HDR
    468 
    469 #ifndef STBI_NO_LINEAR
    470    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
    471    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
    472 #endif // STBI_NO_LINEAR
    473 
    474 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
    475 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    476 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
    477 #ifndef STBI_NO_STDIO
    478 STBIDEF int      stbi_is_hdr          (char const *filename);
    479 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
    480 #endif // STBI_NO_STDIO
    481 
    482 
    483 // get a VERY brief reason for failure
    484 // on most compilers (and ALL modern mainstream compilers) this is threadsafe
    485 STBIDEF const char *stbi_failure_reason  (void);
    486 
    487 // free the loaded image -- this is just free()
    488 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
    489 
    490 // get image dimensions & components without fully decoding
    491 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
    492 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
    493 STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
    494 STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    495 
    496 #ifndef STBI_NO_STDIO
    497 STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
    498 STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
    499 STBIDEF int      stbi_is_16_bit          (char const *filename);
    500 STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
    501 #endif
    502 
    503 
    504 
    505 // for image formats that explicitly notate that they have premultiplied alpha,
    506 // we just return the colors as stored in the file. set this flag to force
    507 // unpremultiplication. results are undefined if the unpremultiply overflow.
    508 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
    509 
    510 // indicate whether we should process iphone images back to canonical format,
    511 // or just pass them through "as-is"
    512 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
    513 
    514 // flip the image vertically, so the first pixel in the output array is the bottom left
    515 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
    516 
    517 // as above, but only applies to images loaded on the thread that calls the function
    518 // this function is only available if your compiler supports thread-local variables;
    519 // calling it will fail to link if your compiler doesn't
    520 STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
    521 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
    522 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
    523 
    524 // ZLIB client - used by PNG, available for other purposes
    525 
    526 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
    527 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
    528 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
    529 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    530 
    531 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
    532 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    533 
    534 
    535 #ifdef __cplusplus
    536 }
    537 #endif
    538 
    539 //
    540 //
    541 ////   end header file   /////////////////////////////////////////////////////
    542 #endif // STBI_INCLUDE_STB_IMAGE_H
    543 
    544 #ifdef STB_IMAGE_IMPLEMENTATION
    545 
    546 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
    547   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
    548   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
    549   || defined(STBI_ONLY_ZLIB)
    550    #ifndef STBI_ONLY_JPEG
    551    #define STBI_NO_JPEG
    552    #endif
    553    #ifndef STBI_ONLY_PNG
    554    #define STBI_NO_PNG
    555    #endif
    556    #ifndef STBI_ONLY_BMP
    557    #define STBI_NO_BMP
    558    #endif
    559    #ifndef STBI_ONLY_PSD
    560    #define STBI_NO_PSD
    561    #endif
    562    #ifndef STBI_ONLY_TGA
    563    #define STBI_NO_TGA
    564    #endif
    565    #ifndef STBI_ONLY_GIF
    566    #define STBI_NO_GIF
    567    #endif
    568    #ifndef STBI_ONLY_HDR
    569    #define STBI_NO_HDR
    570    #endif
    571    #ifndef STBI_ONLY_PIC
    572    #define STBI_NO_PIC
    573    #endif
    574    #ifndef STBI_ONLY_PNM
    575    #define STBI_NO_PNM
    576    #endif
    577 #endif
    578 
    579 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
    580 #define STBI_NO_ZLIB
    581 #endif
    582 
    583 
    584 #include <stdarg.h>
    585 #include <stddef.h> // ptrdiff_t on osx
    586 #include <stdlib.h>
    587 #include <string.h>
    588 #include <limits.h>
    589 
    590 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    591 #include <math.h>  // ldexp, pow
    592 #endif
    593 
    594 #ifndef STBI_NO_STDIO
    595 #include <stdio.h>
    596 #endif
    597 
    598 #ifndef STBI_ASSERT
    599 #include <assert.h>
    600 #define STBI_ASSERT(x) assert(x)
    601 #endif
    602 
    603 #ifdef __cplusplus
    604 #define STBI_EXTERN extern "C"
    605 #else
    606 #define STBI_EXTERN extern
    607 #endif
    608 
    609 
    610 #ifndef _MSC_VER
    611    #ifdef __cplusplus
    612    #define stbi_inline inline
    613    #else
    614    #define stbi_inline
    615    #endif
    616 #else
    617    #define stbi_inline __forceinline
    618 #endif
    619 
    620 #ifndef STBI_NO_THREAD_LOCALS
    621    #if defined(__cplusplus) &&  __cplusplus >= 201103L
    622       #define STBI_THREAD_LOCAL       thread_local
    623    #elif defined(__GNUC__) && __GNUC__ < 5
    624       #define STBI_THREAD_LOCAL       __thread
    625    #elif defined(_MSC_VER)
    626       #define STBI_THREAD_LOCAL       __declspec(thread)
    627    #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
    628       #define STBI_THREAD_LOCAL       _Thread_local
    629    #endif
    630 
    631    #ifndef STBI_THREAD_LOCAL
    632       #if defined(__GNUC__)
    633         #define STBI_THREAD_LOCAL       __thread
    634       #endif
    635    #endif
    636 #endif
    637 
    638 #ifdef _MSC_VER
    639 typedef unsigned short stbi__uint16;
    640 typedef   signed short stbi__int16;
    641 typedef unsigned int   stbi__uint32;
    642 typedef   signed int   stbi__int32;
    643 #else
    644 #include <stdint.h>
    645 typedef uint16_t stbi__uint16;
    646 typedef int16_t  stbi__int16;
    647 typedef uint32_t stbi__uint32;
    648 typedef int32_t  stbi__int32;
    649 #endif
    650 
    651 // should produce compiler error if size is wrong
    652 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
    653 
    654 #ifdef _MSC_VER
    655 #define STBI_NOTUSED(v)  (void)(v)
    656 #else
    657 #define STBI_NOTUSED(v)  (void)sizeof(v)
    658 #endif
    659 
    660 #ifdef _MSC_VER
    661 #define STBI_HAS_LROTL
    662 #endif
    663 
    664 #ifdef STBI_HAS_LROTL
    665    #define stbi_lrot(x,y)  _lrotl(x,y)
    666 #else
    667    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
    668 #endif
    669 
    670 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
    671 // ok
    672 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
    673 // ok
    674 #else
    675 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
    676 #endif
    677 
    678 #ifndef STBI_MALLOC
    679 #define STBI_MALLOC(sz)           malloc(sz)
    680 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
    681 #define STBI_FREE(p)              free(p)
    682 #endif
    683 
    684 #ifndef STBI_REALLOC_SIZED
    685 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
    686 #endif
    687 
    688 // x86/x64 detection
    689 #if defined(__x86_64__) || defined(_M_X64)
    690 #define STBI__X64_TARGET
    691 #elif defined(__i386) || defined(_M_IX86)
    692 #define STBI__X86_TARGET
    693 #endif
    694 
    695 #if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
    696 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
    697 // which in turn means it gets to use SSE2 everywhere. This is unfortunate,
    698 // but previous attempts to provide the SSE2 functions with runtime
    699 // detection caused numerous issues. The way architecture extensions are
    700 // exposed in GCC/Clang is, sadly, not really suited for one-file libs.
    701 // New behavior: if compiled with -msse2, we use SSE2 without any
    702 // detection; if not, we don't use it at all.
    703 #define STBI_NO_SIMD
    704 #endif
    705 
    706 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
    707 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
    708 //
    709 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
    710 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
    711 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
    712 // simultaneously enabling "-mstackrealign".
    713 //
    714 // See https://github.com/nothings/stb/issues/81 for more information.
    715 //
    716 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
    717 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
    718 #define STBI_NO_SIMD
    719 #endif
    720 
    721 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
    722 #define STBI_SSE2
    723 #include <emmintrin.h>
    724 
    725 #ifdef _MSC_VER
    726 
    727 #if _MSC_VER >= 1400  // not VC6
    728 #include <intrin.h> // __cpuid
    729 static int stbi__cpuid3(void)
    730 {
    731    int info[4];
    732    __cpuid(info,1);
    733    return info[3];
    734 }
    735 #else
    736 static int stbi__cpuid3(void)
    737 {
    738    int res;
    739    __asm {
    740       mov  eax,1
    741       cpuid
    742       mov  res,edx
    743    }
    744    return res;
    745 }
    746 #endif
    747 
    748 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    749 
    750 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
    751 static int stbi__sse2_available(void)
    752 {
    753    int info3 = stbi__cpuid3();
    754    return ((info3 >> 26) & 1) != 0;
    755 }
    756 #endif
    757 
    758 #else // assume GCC-style if not VC++
    759 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    760 
    761 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
    762 static int stbi__sse2_available(void)
    763 {
    764    // If we're even attempting to compile this on GCC/Clang, that means
    765    // -msse2 is on, which means the compiler is allowed to use SSE2
    766    // instructions at will, and so are we.
    767    return 1;
    768 }
    769 #endif
    770 
    771 #endif
    772 #endif
    773 
    774 // ARM NEON
    775 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
    776 #undef STBI_NEON
    777 #endif
    778 
    779 #ifdef STBI_NEON
    780 #include <arm_neon.h>
    781 #ifdef _MSC_VER
    782 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    783 #else
    784 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    785 #endif
    786 #endif
    787 
    788 #ifndef STBI_SIMD_ALIGN
    789 #define STBI_SIMD_ALIGN(type, name) type name
    790 #endif
    791 
    792 #ifndef STBI_MAX_DIMENSIONS
    793 #define STBI_MAX_DIMENSIONS (1 << 24)
    794 #endif
    795 
    796 ///////////////////////////////////////////////
    797 //
    798 //  stbi__context struct and start_xxx functions
    799 
    800 // stbi__context structure is our basic context used by all images, so it
    801 // contains all the IO context, plus some basic image information
    802 typedef struct
    803 {
    804    stbi__uint32 img_x, img_y;
    805    int img_n, img_out_n;
    806 
    807    stbi_io_callbacks io;
    808    void *io_user_data;
    809 
    810    int read_from_callbacks;
    811    int buflen;
    812    stbi_uc buffer_start[128];
    813    int callback_already_read;
    814 
    815    stbi_uc *img_buffer, *img_buffer_end;
    816    stbi_uc *img_buffer_original, *img_buffer_original_end;
    817 } stbi__context;
    818 
    819 
    820 static void stbi__refill_buffer(stbi__context *s);
    821 
    822 // initialize a memory-decode context
    823 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
    824 {
    825    s->io.read = NULL;
    826    s->read_from_callbacks = 0;
    827    s->callback_already_read = 0;
    828    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
    829    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
    830 }
    831 
    832 // initialize a callback-based context
    833 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
    834 {
    835    s->io = *c;
    836    s->io_user_data = user;
    837    s->buflen = sizeof(s->buffer_start);
    838    s->read_from_callbacks = 1;
    839    s->callback_already_read = 0;
    840    s->img_buffer = s->img_buffer_original = s->buffer_start;
    841    stbi__refill_buffer(s);
    842    s->img_buffer_original_end = s->img_buffer_end;
    843 }
    844 
    845 #ifndef STBI_NO_STDIO
    846 
    847 static int stbi__stdio_read(void *user, char *data, int size)
    848 {
    849    return (int) fread(data,1,size,(FILE*) user);
    850 }
    851 
    852 static void stbi__stdio_skip(void *user, int n)
    853 {
    854    int ch;
    855    fseek((FILE*) user, n, SEEK_CUR);
    856    ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
    857    if (ch != EOF) {
    858       ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
    859    }
    860 }
    861 
    862 static int stbi__stdio_eof(void *user)
    863 {
    864    return feof((FILE*) user) || ferror((FILE *) user);
    865 }
    866 
    867 static stbi_io_callbacks stbi__stdio_callbacks =
    868 {
    869    stbi__stdio_read,
    870    stbi__stdio_skip,
    871    stbi__stdio_eof,
    872 };
    873 
    874 static void stbi__start_file(stbi__context *s, FILE *f)
    875 {
    876    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
    877 }
    878 
    879 //static void stop_file(stbi__context *s) { }
    880 
    881 #endif // !STBI_NO_STDIO
    882 
    883 static void stbi__rewind(stbi__context *s)
    884 {
    885    // conceptually rewind SHOULD rewind to the beginning of the stream,
    886    // but we just rewind to the beginning of the initial buffer, because
    887    // we only use it after doing 'test', which only ever looks at at most 92 bytes
    888    s->img_buffer = s->img_buffer_original;
    889    s->img_buffer_end = s->img_buffer_original_end;
    890 }
    891 
    892 enum
    893 {
    894    STBI_ORDER_RGB,
    895    STBI_ORDER_BGR
    896 };
    897 
    898 typedef struct
    899 {
    900    int bits_per_channel;
    901    int num_channels;
    902    int channel_order;
    903 } stbi__result_info;
    904 
    905 #ifndef STBI_NO_JPEG
    906 static int      stbi__jpeg_test(stbi__context *s);
    907 static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    908 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
    909 #endif
    910 
    911 #ifndef STBI_NO_PNG
    912 static int      stbi__png_test(stbi__context *s);
    913 static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    914 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
    915 static int      stbi__png_is16(stbi__context *s);
    916 #endif
    917 
    918 #ifndef STBI_NO_BMP
    919 static int      stbi__bmp_test(stbi__context *s);
    920 static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    921 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
    922 #endif
    923 
    924 #ifndef STBI_NO_TGA
    925 static int      stbi__tga_test(stbi__context *s);
    926 static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    927 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
    928 #endif
    929 
    930 #ifndef STBI_NO_PSD
    931 static int      stbi__psd_test(stbi__context *s);
    932 static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
    933 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
    934 static int      stbi__psd_is16(stbi__context *s);
    935 #endif
    936 
    937 #ifndef STBI_NO_HDR
    938 static int      stbi__hdr_test(stbi__context *s);
    939 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    940 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
    941 #endif
    942 
    943 #ifndef STBI_NO_PIC
    944 static int      stbi__pic_test(stbi__context *s);
    945 static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    946 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
    947 #endif
    948 
    949 #ifndef STBI_NO_GIF
    950 static int      stbi__gif_test(stbi__context *s);
    951 static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    952 static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    953 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
    954 #endif
    955 
    956 #ifndef STBI_NO_PNM
    957 static int      stbi__pnm_test(stbi__context *s);
    958 static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    959 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
    960 static int      stbi__pnm_is16(stbi__context *s);
    961 #endif
    962 
    963 static
    964 #ifdef STBI_THREAD_LOCAL
    965 STBI_THREAD_LOCAL
    966 #endif
    967 const char *stbi__g_failure_reason;
    968 
    969 STBIDEF const char *stbi_failure_reason(void)
    970 {
    971    return stbi__g_failure_reason;
    972 }
    973 
    974 #ifndef STBI_NO_FAILURE_STRINGS
    975 static int stbi__err(const char *str)
    976 {
    977    stbi__g_failure_reason = str;
    978    return 0;
    979 }
    980 #endif
    981 
    982 static void *stbi__malloc(size_t size)
    983 {
    984     return STBI_MALLOC(size);
    985 }
    986 
    987 // stb_image uses ints pervasively, including for offset calculations.
    988 // therefore the largest decoded image size we can support with the
    989 // current code, even on 64-bit targets, is INT_MAX. this is not a
    990 // significant limitation for the intended use case.
    991 //
    992 // we do, however, need to make sure our size calculations don't
    993 // overflow. hence a few helper functions for size calculations that
    994 // multiply integers together, making sure that they're non-negative
    995 // and no overflow occurs.
    996 
    997 // return 1 if the sum is valid, 0 on overflow.
    998 // negative terms are considered invalid.
    999 static int stbi__addsizes_valid(int a, int b)
   1000 {
   1001    if (b < 0) return 0;
   1002    // now 0 <= b <= INT_MAX, hence also
   1003    // 0 <= INT_MAX - b <= INTMAX.
   1004    // And "a + b <= INT_MAX" (which might overflow) is the
   1005    // same as a <= INT_MAX - b (no overflow)
   1006    return a <= INT_MAX - b;
   1007 }
   1008 
   1009 // returns 1 if the product is valid, 0 on overflow.
   1010 // negative factors are considered invalid.
   1011 static int stbi__mul2sizes_valid(int a, int b)
   1012 {
   1013    if (a < 0 || b < 0) return 0;
   1014    if (b == 0) return 1; // mul-by-0 is always safe
   1015    // portable way to check for no overflows in a*b
   1016    return a <= INT_MAX/b;
   1017 }
   1018 
   1019 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
   1020 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
   1021 static int stbi__mad2sizes_valid(int a, int b, int add)
   1022 {
   1023    return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
   1024 }
   1025 #endif
   1026 
   1027 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
   1028 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
   1029 {
   1030    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
   1031       stbi__addsizes_valid(a*b*c, add);
   1032 }
   1033 
   1034 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
   1035 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
   1036 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
   1037 {
   1038    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
   1039       stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
   1040 }
   1041 #endif
   1042 
   1043 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
   1044 // mallocs with size overflow checking
   1045 static void *stbi__malloc_mad2(int a, int b, int add)
   1046 {
   1047    if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
   1048    return stbi__malloc(a*b + add);
   1049 }
   1050 #endif
   1051 
   1052 static void *stbi__malloc_mad3(int a, int b, int c, int add)
   1053 {
   1054    if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
   1055    return stbi__malloc(a*b*c + add);
   1056 }
   1057 
   1058 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
   1059 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
   1060 {
   1061    if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
   1062    return stbi__malloc(a*b*c*d + add);
   1063 }
   1064 #endif
   1065 
   1066 // stbi__err - error
   1067 // stbi__errpf - error returning pointer to float
   1068 // stbi__errpuc - error returning pointer to unsigned char
   1069 
   1070 #ifdef STBI_NO_FAILURE_STRINGS
   1071    #define stbi__err(x,y)  0
   1072 #elif defined(STBI_FAILURE_USERMSG)
   1073    #define stbi__err(x,y)  stbi__err(y)
   1074 #else
   1075    #define stbi__err(x,y)  stbi__err(x)
   1076 #endif
   1077 
   1078 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
   1079 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
   1080 
   1081 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
   1082 {
   1083    STBI_FREE(retval_from_stbi_load);
   1084 }
   1085 
   1086 #ifndef STBI_NO_LINEAR
   1087 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
   1088 #endif
   1089 
   1090 #ifndef STBI_NO_HDR
   1091 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
   1092 #endif
   1093 
   1094 static int stbi__vertically_flip_on_load_global = 0;
   1095 
   1096 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
   1097 {
   1098    stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
   1099 }
   1100 
   1101 #ifndef STBI_THREAD_LOCAL
   1102 #define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
   1103 #else
   1104 static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
   1105 
   1106 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
   1107 {
   1108    stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
   1109    stbi__vertically_flip_on_load_set = 1;
   1110 }
   1111 
   1112 #define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
   1113                                          ? stbi__vertically_flip_on_load_local  \
   1114                                          : stbi__vertically_flip_on_load_global)
   1115 #endif // STBI_THREAD_LOCAL
   1116 
   1117 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
   1118 {
   1119    memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
   1120    ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
   1121    ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
   1122    ri->num_channels = 0;
   1123 
   1124    // test the formats with a very explicit header first (at least a FOURCC
   1125    // or distinctive magic number first)
   1126    #ifndef STBI_NO_PNG
   1127    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
   1128    #endif
   1129    #ifndef STBI_NO_BMP
   1130    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
   1131    #endif
   1132    #ifndef STBI_NO_GIF
   1133    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
   1134    #endif
   1135    #ifndef STBI_NO_PSD
   1136    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
   1137    #else
   1138    STBI_NOTUSED(bpc);
   1139    #endif
   1140    #ifndef STBI_NO_PIC
   1141    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
   1142    #endif
   1143 
   1144    // then the formats that can end up attempting to load with just 1 or 2
   1145    // bytes matching expectations; these are prone to false positives, so
   1146    // try them later
   1147    #ifndef STBI_NO_JPEG
   1148    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
   1149    #endif
   1150    #ifndef STBI_NO_PNM
   1151    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
   1152    #endif
   1153 
   1154    #ifndef STBI_NO_HDR
   1155    if (stbi__hdr_test(s)) {
   1156       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
   1157       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
   1158    }
   1159    #endif
   1160 
   1161    #ifndef STBI_NO_TGA
   1162    // test tga last because it's a crappy test!
   1163    if (stbi__tga_test(s))
   1164       return stbi__tga_load(s,x,y,comp,req_comp, ri);
   1165    #endif
   1166 
   1167    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
   1168 }
   1169 
   1170 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
   1171 {
   1172    int i;
   1173    int img_len = w * h * channels;
   1174    stbi_uc *reduced;
   1175 
   1176    reduced = (stbi_uc *) stbi__malloc(img_len);
   1177    if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
   1178 
   1179    for (i = 0; i < img_len; ++i)
   1180       reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
   1181 
   1182    STBI_FREE(orig);
   1183    return reduced;
   1184 }
   1185 
   1186 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
   1187 {
   1188    int i;
   1189    int img_len = w * h * channels;
   1190    stbi__uint16 *enlarged;
   1191 
   1192    enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
   1193    if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1194 
   1195    for (i = 0; i < img_len; ++i)
   1196       enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
   1197 
   1198    STBI_FREE(orig);
   1199    return enlarged;
   1200 }
   1201 
   1202 static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
   1203 {
   1204    int row;
   1205    size_t bytes_per_row = (size_t)w * bytes_per_pixel;
   1206    stbi_uc temp[2048];
   1207    stbi_uc *bytes = (stbi_uc *)image;
   1208 
   1209    for (row = 0; row < (h>>1); row++) {
   1210       stbi_uc *row0 = bytes + row*bytes_per_row;
   1211       stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
   1212       // swap row0 with row1
   1213       size_t bytes_left = bytes_per_row;
   1214       while (bytes_left) {
   1215          size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
   1216          memcpy(temp, row0, bytes_copy);
   1217          memcpy(row0, row1, bytes_copy);
   1218          memcpy(row1, temp, bytes_copy);
   1219          row0 += bytes_copy;
   1220          row1 += bytes_copy;
   1221          bytes_left -= bytes_copy;
   1222       }
   1223    }
   1224 }
   1225 
   1226 #ifndef STBI_NO_GIF
   1227 static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
   1228 {
   1229    int slice;
   1230    int slice_size = w * h * bytes_per_pixel;
   1231 
   1232    stbi_uc *bytes = (stbi_uc *)image;
   1233    for (slice = 0; slice < z; ++slice) {
   1234       stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
   1235       bytes += slice_size;
   1236    }
   1237 }
   1238 #endif
   1239 
   1240 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1241 {
   1242    stbi__result_info ri;
   1243    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
   1244 
   1245    if (result == NULL)
   1246       return NULL;
   1247 
   1248    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
   1249    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
   1250 
   1251    if (ri.bits_per_channel != 8) {
   1252       result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1253       ri.bits_per_channel = 8;
   1254    }
   1255 
   1256    // @TODO: move stbi__convert_format to here
   1257 
   1258    if (stbi__vertically_flip_on_load) {
   1259       int channels = req_comp ? req_comp : *comp;
   1260       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
   1261    }
   1262 
   1263    return (unsigned char *) result;
   1264 }
   1265 
   1266 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1267 {
   1268    stbi__result_info ri;
   1269    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
   1270 
   1271    if (result == NULL)
   1272       return NULL;
   1273 
   1274    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
   1275    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
   1276 
   1277    if (ri.bits_per_channel != 16) {
   1278       result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1279       ri.bits_per_channel = 16;
   1280    }
   1281 
   1282    // @TODO: move stbi__convert_format16 to here
   1283    // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
   1284 
   1285    if (stbi__vertically_flip_on_load) {
   1286       int channels = req_comp ? req_comp : *comp;
   1287       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
   1288    }
   1289 
   1290    return (stbi__uint16 *) result;
   1291 }
   1292 
   1293 #if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
   1294 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
   1295 {
   1296    if (stbi__vertically_flip_on_load && result != NULL) {
   1297       int channels = req_comp ? req_comp : *comp;
   1298       stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
   1299    }
   1300 }
   1301 #endif
   1302 
   1303 #ifndef STBI_NO_STDIO
   1304 
   1305 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1306 STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
   1307 STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
   1308 #endif
   1309 
   1310 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1311 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
   1312 {
   1313 	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
   1314 }
   1315 #endif
   1316 
   1317 static FILE *stbi__fopen(char const *filename, char const *mode)
   1318 {
   1319    FILE *f;
   1320 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1321    wchar_t wMode[64];
   1322    wchar_t wFilename[1024];
   1323 	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
   1324       return 0;
   1325 
   1326 	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
   1327       return 0;
   1328 
   1329 #if defined(_MSC_VER) && _MSC_VER >= 1400
   1330 	if (0 != _wfopen_s(&f, wFilename, wMode))
   1331 		f = 0;
   1332 #else
   1333    f = _wfopen(wFilename, wMode);
   1334 #endif
   1335 
   1336 #elif defined(_MSC_VER) && _MSC_VER >= 1400
   1337    if (0 != fopen_s(&f, filename, mode))
   1338       f=0;
   1339 #else
   1340    f = fopen(filename, mode);
   1341 #endif
   1342    return f;
   1343 }
   1344 
   1345 
   1346 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
   1347 {
   1348    FILE *f = stbi__fopen(filename, "rb");
   1349    unsigned char *result;
   1350    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
   1351    result = stbi_load_from_file(f,x,y,comp,req_comp);
   1352    fclose(f);
   1353    return result;
   1354 }
   1355 
   1356 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1357 {
   1358    unsigned char *result;
   1359    stbi__context s;
   1360    stbi__start_file(&s,f);
   1361    result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1362    if (result) {
   1363       // need to 'unget' all the characters in the IO buffer
   1364       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1365    }
   1366    return result;
   1367 }
   1368 
   1369 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
   1370 {
   1371    stbi__uint16 *result;
   1372    stbi__context s;
   1373    stbi__start_file(&s,f);
   1374    result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
   1375    if (result) {
   1376       // need to 'unget' all the characters in the IO buffer
   1377       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1378    }
   1379    return result;
   1380 }
   1381 
   1382 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
   1383 {
   1384    FILE *f = stbi__fopen(filename, "rb");
   1385    stbi__uint16 *result;
   1386    if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
   1387    result = stbi_load_from_file_16(f,x,y,comp,req_comp);
   1388    fclose(f);
   1389    return result;
   1390 }
   1391 
   1392 
   1393 #endif //!STBI_NO_STDIO
   1394 
   1395 STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
   1396 {
   1397    stbi__context s;
   1398    stbi__start_mem(&s,buffer,len);
   1399    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1400 }
   1401 
   1402 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
   1403 {
   1404    stbi__context s;
   1405    stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
   1406    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1407 }
   1408 
   1409 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1410 {
   1411    stbi__context s;
   1412    stbi__start_mem(&s,buffer,len);
   1413    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1414 }
   1415 
   1416 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1417 {
   1418    stbi__context s;
   1419    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1420    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1421 }
   1422 
   1423 #ifndef STBI_NO_GIF
   1424 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   1425 {
   1426    unsigned char *result;
   1427    stbi__context s;
   1428    stbi__start_mem(&s,buffer,len);
   1429 
   1430    result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
   1431    if (stbi__vertically_flip_on_load) {
   1432       stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
   1433    }
   1434 
   1435    return result;
   1436 }
   1437 #endif
   1438 
   1439 #ifndef STBI_NO_LINEAR
   1440 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1441 {
   1442    unsigned char *data;
   1443    #ifndef STBI_NO_HDR
   1444    if (stbi__hdr_test(s)) {
   1445       stbi__result_info ri;
   1446       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
   1447       if (hdr_data)
   1448          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
   1449       return hdr_data;
   1450    }
   1451    #endif
   1452    data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
   1453    if (data)
   1454       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
   1455    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
   1456 }
   1457 
   1458 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1459 {
   1460    stbi__context s;
   1461    stbi__start_mem(&s,buffer,len);
   1462    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1463 }
   1464 
   1465 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1466 {
   1467    stbi__context s;
   1468    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1469    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1470 }
   1471 
   1472 #ifndef STBI_NO_STDIO
   1473 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
   1474 {
   1475    float *result;
   1476    FILE *f = stbi__fopen(filename, "rb");
   1477    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
   1478    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
   1479    fclose(f);
   1480    return result;
   1481 }
   1482 
   1483 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1484 {
   1485    stbi__context s;
   1486    stbi__start_file(&s,f);
   1487    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1488 }
   1489 #endif // !STBI_NO_STDIO
   1490 
   1491 #endif // !STBI_NO_LINEAR
   1492 
   1493 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
   1494 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
   1495 // reports false!
   1496 
   1497 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
   1498 {
   1499    #ifndef STBI_NO_HDR
   1500    stbi__context s;
   1501    stbi__start_mem(&s,buffer,len);
   1502    return stbi__hdr_test(&s);
   1503    #else
   1504    STBI_NOTUSED(buffer);
   1505    STBI_NOTUSED(len);
   1506    return 0;
   1507    #endif
   1508 }
   1509 
   1510 #ifndef STBI_NO_STDIO
   1511 STBIDEF int      stbi_is_hdr          (char const *filename)
   1512 {
   1513    FILE *f = stbi__fopen(filename, "rb");
   1514    int result=0;
   1515    if (f) {
   1516       result = stbi_is_hdr_from_file(f);
   1517       fclose(f);
   1518    }
   1519    return result;
   1520 }
   1521 
   1522 STBIDEF int stbi_is_hdr_from_file(FILE *f)
   1523 {
   1524    #ifndef STBI_NO_HDR
   1525    long pos = ftell(f);
   1526    int res;
   1527    stbi__context s;
   1528    stbi__start_file(&s,f);
   1529    res = stbi__hdr_test(&s);
   1530    fseek(f, pos, SEEK_SET);
   1531    return res;
   1532    #else
   1533    STBI_NOTUSED(f);
   1534    return 0;
   1535    #endif
   1536 }
   1537 #endif // !STBI_NO_STDIO
   1538 
   1539 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
   1540 {
   1541    #ifndef STBI_NO_HDR
   1542    stbi__context s;
   1543    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1544    return stbi__hdr_test(&s);
   1545    #else
   1546    STBI_NOTUSED(clbk);
   1547    STBI_NOTUSED(user);
   1548    return 0;
   1549    #endif
   1550 }
   1551 
   1552 #ifndef STBI_NO_LINEAR
   1553 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
   1554 
   1555 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
   1556 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
   1557 #endif
   1558 
   1559 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
   1560 
   1561 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
   1562 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
   1563 
   1564 
   1565 //////////////////////////////////////////////////////////////////////////////
   1566 //
   1567 // Common code used by all image loaders
   1568 //
   1569 
   1570 enum
   1571 {
   1572    STBI__SCAN_load=0,
   1573    STBI__SCAN_type,
   1574    STBI__SCAN_header
   1575 };
   1576 
   1577 static void stbi__refill_buffer(stbi__context *s)
   1578 {
   1579    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
   1580    s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
   1581    if (n == 0) {
   1582       // at end of file, treat same as if from memory, but need to handle case
   1583       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
   1584       s->read_from_callbacks = 0;
   1585       s->img_buffer = s->buffer_start;
   1586       s->img_buffer_end = s->buffer_start+1;
   1587       *s->img_buffer = 0;
   1588    } else {
   1589       s->img_buffer = s->buffer_start;
   1590       s->img_buffer_end = s->buffer_start + n;
   1591    }
   1592 }
   1593 
   1594 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
   1595 {
   1596    if (s->img_buffer < s->img_buffer_end)
   1597       return *s->img_buffer++;
   1598    if (s->read_from_callbacks) {
   1599       stbi__refill_buffer(s);
   1600       return *s->img_buffer++;
   1601    }
   1602    return 0;
   1603 }
   1604 
   1605 #if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1606 // nothing
   1607 #else
   1608 stbi_inline static int stbi__at_eof(stbi__context *s)
   1609 {
   1610    if (s->io.read) {
   1611       if (!(s->io.eof)(s->io_user_data)) return 0;
   1612       // if feof() is true, check if buffer = end
   1613       // special case: we've only got the special 0 character at the end
   1614       if (s->read_from_callbacks == 0) return 1;
   1615    }
   1616 
   1617    return s->img_buffer >= s->img_buffer_end;
   1618 }
   1619 #endif
   1620 
   1621 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
   1622 // nothing
   1623 #else
   1624 static void stbi__skip(stbi__context *s, int n)
   1625 {
   1626    if (n == 0) return;  // already there!
   1627    if (n < 0) {
   1628       s->img_buffer = s->img_buffer_end;
   1629       return;
   1630    }
   1631    if (s->io.read) {
   1632       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1633       if (blen < n) {
   1634          s->img_buffer = s->img_buffer_end;
   1635          (s->io.skip)(s->io_user_data, n - blen);
   1636          return;
   1637       }
   1638    }
   1639    s->img_buffer += n;
   1640 }
   1641 #endif
   1642 
   1643 #if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
   1644 // nothing
   1645 #else
   1646 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
   1647 {
   1648    if (s->io.read) {
   1649       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1650       if (blen < n) {
   1651          int res, count;
   1652 
   1653          memcpy(buffer, s->img_buffer, blen);
   1654 
   1655          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
   1656          res = (count == (n-blen));
   1657          s->img_buffer = s->img_buffer_end;
   1658          return res;
   1659       }
   1660    }
   1661 
   1662    if (s->img_buffer+n <= s->img_buffer_end) {
   1663       memcpy(buffer, s->img_buffer, n);
   1664       s->img_buffer += n;
   1665       return 1;
   1666    } else
   1667       return 0;
   1668 }
   1669 #endif
   1670 
   1671 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
   1672 // nothing
   1673 #else
   1674 static int stbi__get16be(stbi__context *s)
   1675 {
   1676    int z = stbi__get8(s);
   1677    return (z << 8) + stbi__get8(s);
   1678 }
   1679 #endif
   1680 
   1681 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
   1682 // nothing
   1683 #else
   1684 static stbi__uint32 stbi__get32be(stbi__context *s)
   1685 {
   1686    stbi__uint32 z = stbi__get16be(s);
   1687    return (z << 16) + stbi__get16be(s);
   1688 }
   1689 #endif
   1690 
   1691 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
   1692 // nothing
   1693 #else
   1694 static int stbi__get16le(stbi__context *s)
   1695 {
   1696    int z = stbi__get8(s);
   1697    return z + (stbi__get8(s) << 8);
   1698 }
   1699 #endif
   1700 
   1701 #ifndef STBI_NO_BMP
   1702 static stbi__uint32 stbi__get32le(stbi__context *s)
   1703 {
   1704    stbi__uint32 z = stbi__get16le(s);
   1705    z += (stbi__uint32)stbi__get16le(s) << 16;
   1706    return z;
   1707 }
   1708 #endif
   1709 
   1710 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
   1711 
   1712 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1713 // nothing
   1714 #else
   1715 //////////////////////////////////////////////////////////////////////////////
   1716 //
   1717 //  generic converter from built-in img_n to req_comp
   1718 //    individual types do this automatically as much as possible (e.g. jpeg
   1719 //    does all cases internally since it needs to colorspace convert anyway,
   1720 //    and it never has alpha, so very few cases ). png can automatically
   1721 //    interleave an alpha=255 channel, but falls back to this for other cases
   1722 //
   1723 //  assume data buffer is malloced, so malloc a new one and free that one
   1724 //  only failure mode is malloc failing
   1725 
   1726 static stbi_uc stbi__compute_y(int r, int g, int b)
   1727 {
   1728    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
   1729 }
   1730 #endif
   1731 
   1732 #if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1733 // nothing
   1734 #else
   1735 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1736 {
   1737    int i,j;
   1738    unsigned char *good;
   1739 
   1740    if (req_comp == img_n) return data;
   1741    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1742 
   1743    good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
   1744    if (good == NULL) {
   1745       STBI_FREE(data);
   1746       return stbi__errpuc("outofmem", "Out of memory");
   1747    }
   1748 
   1749    for (j=0; j < (int) y; ++j) {
   1750       unsigned char *src  = data + j * x * img_n   ;
   1751       unsigned char *dest = good + j * x * req_comp;
   1752 
   1753       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1754       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1755       // convert source image with img_n components to one with req_comp components;
   1756       // avoid switch per pixel, so use switch per scanline and massive macros
   1757       switch (STBI__COMBO(img_n, req_comp)) {
   1758          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
   1759          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1760          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
   1761          STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
   1762          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1763          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
   1764          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
   1765          STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1766          STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
   1767          STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1768          STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
   1769          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
   1770          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
   1771       }
   1772       #undef STBI__CASE
   1773    }
   1774 
   1775    STBI_FREE(data);
   1776    return good;
   1777 }
   1778 #endif
   1779 
   1780 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
   1781 // nothing
   1782 #else
   1783 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
   1784 {
   1785    return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
   1786 }
   1787 #endif
   1788 
   1789 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
   1790 // nothing
   1791 #else
   1792 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1793 {
   1794    int i,j;
   1795    stbi__uint16 *good;
   1796 
   1797    if (req_comp == img_n) return data;
   1798    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1799 
   1800    good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
   1801    if (good == NULL) {
   1802       STBI_FREE(data);
   1803       return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1804    }
   1805 
   1806    for (j=0; j < (int) y; ++j) {
   1807       stbi__uint16 *src  = data + j * x * img_n   ;
   1808       stbi__uint16 *dest = good + j * x * req_comp;
   1809 
   1810       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1811       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1812       // convert source image with img_n components to one with req_comp components;
   1813       // avoid switch per pixel, so use switch per scanline and massive macros
   1814       switch (STBI__COMBO(img_n, req_comp)) {
   1815          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
   1816          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1817          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
   1818          STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
   1819          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1820          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
   1821          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
   1822          STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1823          STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
   1824          STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1825          STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
   1826          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
   1827          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
   1828       }
   1829       #undef STBI__CASE
   1830    }
   1831 
   1832    STBI_FREE(data);
   1833    return good;
   1834 }
   1835 #endif
   1836 
   1837 #ifndef STBI_NO_LINEAR
   1838 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
   1839 {
   1840    int i,k,n;
   1841    float *output;
   1842    if (!data) return NULL;
   1843    output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
   1844    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
   1845    // compute number of non-alpha components
   1846    if (comp & 1) n = comp; else n = comp-1;
   1847    for (i=0; i < x*y; ++i) {
   1848       for (k=0; k < n; ++k) {
   1849          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
   1850       }
   1851    }
   1852    if (n < comp) {
   1853       for (i=0; i < x*y; ++i) {
   1854          output[i*comp + n] = data[i*comp + n]/255.0f;
   1855       }
   1856    }
   1857    STBI_FREE(data);
   1858    return output;
   1859 }
   1860 #endif
   1861 
   1862 #ifndef STBI_NO_HDR
   1863 #define stbi__float2int(x)   ((int) (x))
   1864 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
   1865 {
   1866    int i,k,n;
   1867    stbi_uc *output;
   1868    if (!data) return NULL;
   1869    output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
   1870    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
   1871    // compute number of non-alpha components
   1872    if (comp & 1) n = comp; else n = comp-1;
   1873    for (i=0; i < x*y; ++i) {
   1874       for (k=0; k < n; ++k) {
   1875          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
   1876          if (z < 0) z = 0;
   1877          if (z > 255) z = 255;
   1878          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1879       }
   1880       if (k < comp) {
   1881          float z = data[i*comp+k] * 255 + 0.5f;
   1882          if (z < 0) z = 0;
   1883          if (z > 255) z = 255;
   1884          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1885       }
   1886    }
   1887    STBI_FREE(data);
   1888    return output;
   1889 }
   1890 #endif
   1891 
   1892 //////////////////////////////////////////////////////////////////////////////
   1893 //
   1894 //  "baseline" JPEG/JFIF decoder
   1895 //
   1896 //    simple implementation
   1897 //      - doesn't support delayed output of y-dimension
   1898 //      - simple interface (only one output format: 8-bit interleaved RGB)
   1899 //      - doesn't try to recover corrupt jpegs
   1900 //      - doesn't allow partial loading, loading multiple at once
   1901 //      - still fast on x86 (copying globals into locals doesn't help x86)
   1902 //      - allocates lots of intermediate memory (full size of all components)
   1903 //        - non-interleaved case requires this anyway
   1904 //        - allows good upsampling (see next)
   1905 //    high-quality
   1906 //      - upsampled channels are bilinearly interpolated, even across blocks
   1907 //      - quality integer IDCT derived from IJG's 'slow'
   1908 //    performance
   1909 //      - fast huffman; reasonable integer IDCT
   1910 //      - some SIMD kernels for common paths on targets with SSE2/NEON
   1911 //      - uses a lot of intermediate memory, could cache poorly
   1912 
   1913 #ifndef STBI_NO_JPEG
   1914 
   1915 // huffman decoding acceleration
   1916 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
   1917 
   1918 typedef struct
   1919 {
   1920    stbi_uc  fast[1 << FAST_BITS];
   1921    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
   1922    stbi__uint16 code[256];
   1923    stbi_uc  values[256];
   1924    stbi_uc  size[257];
   1925    unsigned int maxcode[18];
   1926    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
   1927 } stbi__huffman;
   1928 
   1929 typedef struct
   1930 {
   1931    stbi__context *s;
   1932    stbi__huffman huff_dc[4];
   1933    stbi__huffman huff_ac[4];
   1934    stbi__uint16 dequant[4][64];
   1935    stbi__int16 fast_ac[4][1 << FAST_BITS];
   1936 
   1937 // sizes for components, interleaved MCUs
   1938    int img_h_max, img_v_max;
   1939    int img_mcu_x, img_mcu_y;
   1940    int img_mcu_w, img_mcu_h;
   1941 
   1942 // definition of jpeg image component
   1943    struct
   1944    {
   1945       int id;
   1946       int h,v;
   1947       int tq;
   1948       int hd,ha;
   1949       int dc_pred;
   1950 
   1951       int x,y,w2,h2;
   1952       stbi_uc *data;
   1953       void *raw_data, *raw_coeff;
   1954       stbi_uc *linebuf;
   1955       short   *coeff;   // progressive only
   1956       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
   1957    } img_comp[4];
   1958 
   1959    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
   1960    int            code_bits;   // number of valid bits
   1961    unsigned char  marker;      // marker seen while filling entropy buffer
   1962    int            nomore;      // flag if we saw a marker so must stop
   1963 
   1964    int            progressive;
   1965    int            spec_start;
   1966    int            spec_end;
   1967    int            succ_high;
   1968    int            succ_low;
   1969    int            eob_run;
   1970    int            jfif;
   1971    int            app14_color_transform; // Adobe APP14 tag
   1972    int            rgb;
   1973 
   1974    int scan_n, order[4];
   1975    int restart_interval, todo;
   1976 
   1977 // kernels
   1978    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
   1979    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
   1980    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
   1981 } stbi__jpeg;
   1982 
   1983 static int stbi__build_huffman(stbi__huffman *h, int *count)
   1984 {
   1985    int i,j,k=0;
   1986    unsigned int code;
   1987    // build size list for each symbol (from JPEG spec)
   1988    for (i=0; i < 16; ++i)
   1989       for (j=0; j < count[i]; ++j)
   1990          h->size[k++] = (stbi_uc) (i+1);
   1991    h->size[k] = 0;
   1992 
   1993    // compute actual symbols (from jpeg spec)
   1994    code = 0;
   1995    k = 0;
   1996    for(j=1; j <= 16; ++j) {
   1997       // compute delta to add to code to compute symbol id
   1998       h->delta[j] = k - code;
   1999       if (h->size[k] == j) {
   2000          while (h->size[k] == j)
   2001             h->code[k++] = (stbi__uint16) (code++);
   2002          if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
   2003       }
   2004       // compute largest code + 1 for this size, preshifted as needed later
   2005       h->maxcode[j] = code << (16-j);
   2006       code <<= 1;
   2007    }
   2008    h->maxcode[j] = 0xffffffff;
   2009 
   2010    // build non-spec acceleration table; 255 is flag for not-accelerated
   2011    memset(h->fast, 255, 1 << FAST_BITS);
   2012    for (i=0; i < k; ++i) {
   2013       int s = h->size[i];
   2014       if (s <= FAST_BITS) {
   2015          int c = h->code[i] << (FAST_BITS-s);
   2016          int m = 1 << (FAST_BITS-s);
   2017          for (j=0; j < m; ++j) {
   2018             h->fast[c+j] = (stbi_uc) i;
   2019          }
   2020       }
   2021    }
   2022    return 1;
   2023 }
   2024 
   2025 // build a table that decodes both magnitude and value of small ACs in
   2026 // one go.
   2027 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
   2028 {
   2029    int i;
   2030    for (i=0; i < (1 << FAST_BITS); ++i) {
   2031       stbi_uc fast = h->fast[i];
   2032       fast_ac[i] = 0;
   2033       if (fast < 255) {
   2034          int rs = h->values[fast];
   2035          int run = (rs >> 4) & 15;
   2036          int magbits = rs & 15;
   2037          int len = h->size[fast];
   2038 
   2039          if (magbits && len + magbits <= FAST_BITS) {
   2040             // magnitude code followed by receive_extend code
   2041             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
   2042             int m = 1 << (magbits - 1);
   2043             if (k < m) k += (~0U << magbits) + 1;
   2044             // if the result is small enough, we can fit it in fast_ac table
   2045             if (k >= -128 && k <= 127)
   2046                fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
   2047          }
   2048       }
   2049    }
   2050 }
   2051 
   2052 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
   2053 {
   2054    do {
   2055       unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
   2056       if (b == 0xff) {
   2057          int c = stbi__get8(j->s);
   2058          while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
   2059          if (c != 0) {
   2060             j->marker = (unsigned char) c;
   2061             j->nomore = 1;
   2062             return;
   2063          }
   2064       }
   2065       j->code_buffer |= b << (24 - j->code_bits);
   2066       j->code_bits += 8;
   2067    } while (j->code_bits <= 24);
   2068 }
   2069 
   2070 // (1 << n) - 1
   2071 static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
   2072 
   2073 // decode a jpeg huffman value from the bitstream
   2074 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
   2075 {
   2076    unsigned int temp;
   2077    int c,k;
   2078 
   2079    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2080 
   2081    // look at the top FAST_BITS and determine what symbol ID it is,
   2082    // if the code is <= FAST_BITS
   2083    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2084    k = h->fast[c];
   2085    if (k < 255) {
   2086       int s = h->size[k];
   2087       if (s > j->code_bits)
   2088          return -1;
   2089       j->code_buffer <<= s;
   2090       j->code_bits -= s;
   2091       return h->values[k];
   2092    }
   2093 
   2094    // naive test is to shift the code_buffer down so k bits are
   2095    // valid, then test against maxcode. To speed this up, we've
   2096    // preshifted maxcode left so that it has (16-k) 0s at the
   2097    // end; in other words, regardless of the number of bits, it
   2098    // wants to be compared against something shifted to have 16;
   2099    // that way we don't need to shift inside the loop.
   2100    temp = j->code_buffer >> 16;
   2101    for (k=FAST_BITS+1 ; ; ++k)
   2102       if (temp < h->maxcode[k])
   2103          break;
   2104    if (k == 17) {
   2105       // error! code not found
   2106       j->code_bits -= 16;
   2107       return -1;
   2108    }
   2109 
   2110    if (k > j->code_bits)
   2111       return -1;
   2112 
   2113    // convert the huffman code to the symbol id
   2114    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
   2115    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
   2116 
   2117    // convert the id to a symbol
   2118    j->code_bits -= k;
   2119    j->code_buffer <<= k;
   2120    return h->values[c];
   2121 }
   2122 
   2123 // bias[n] = (-1<<n) + 1
   2124 static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
   2125 
   2126 // combined JPEG 'receive' and JPEG 'extend', since baseline
   2127 // always extends everything it receives.
   2128 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
   2129 {
   2130    unsigned int k;
   2131    int sgn;
   2132    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   2133 
   2134    sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
   2135    k = stbi_lrot(j->code_buffer, n);
   2136    j->code_buffer = k & ~stbi__bmask[n];
   2137    k &= stbi__bmask[n];
   2138    j->code_bits -= n;
   2139    return k + (stbi__jbias[n] & (sgn - 1));
   2140 }
   2141 
   2142 // get some unsigned bits
   2143 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
   2144 {
   2145    unsigned int k;
   2146    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   2147    k = stbi_lrot(j->code_buffer, n);
   2148    j->code_buffer = k & ~stbi__bmask[n];
   2149    k &= stbi__bmask[n];
   2150    j->code_bits -= n;
   2151    return k;
   2152 }
   2153 
   2154 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
   2155 {
   2156    unsigned int k;
   2157    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
   2158    k = j->code_buffer;
   2159    j->code_buffer <<= 1;
   2160    --j->code_bits;
   2161    return k & 0x80000000;
   2162 }
   2163 
   2164 // given a value that's at position X in the zigzag stream,
   2165 // where does it appear in the 8x8 matrix coded as row-major?
   2166 static const stbi_uc stbi__jpeg_dezigzag[64+15] =
   2167 {
   2168     0,  1,  8, 16,  9,  2,  3, 10,
   2169    17, 24, 32, 25, 18, 11,  4,  5,
   2170    12, 19, 26, 33, 40, 48, 41, 34,
   2171    27, 20, 13,  6,  7, 14, 21, 28,
   2172    35, 42, 49, 56, 57, 50, 43, 36,
   2173    29, 22, 15, 23, 30, 37, 44, 51,
   2174    58, 59, 52, 45, 38, 31, 39, 46,
   2175    53, 60, 61, 54, 47, 55, 62, 63,
   2176    // let corrupt input sample past end
   2177    63, 63, 63, 63, 63, 63, 63, 63,
   2178    63, 63, 63, 63, 63, 63, 63
   2179 };
   2180 
   2181 // decode one 64-entry block--
   2182 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
   2183 {
   2184    int diff,dc,k;
   2185    int t;
   2186 
   2187    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2188    t = stbi__jpeg_huff_decode(j, hdc);
   2189    if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
   2190 
   2191    // 0 all the ac values now so we can do it 32-bits at a time
   2192    memset(data,0,64*sizeof(data[0]));
   2193 
   2194    diff = t ? stbi__extend_receive(j, t) : 0;
   2195    dc = j->img_comp[b].dc_pred + diff;
   2196    j->img_comp[b].dc_pred = dc;
   2197    data[0] = (short) (dc * dequant[0]);
   2198 
   2199    // decode AC components, see JPEG spec
   2200    k = 1;
   2201    do {
   2202       unsigned int zig;
   2203       int c,r,s;
   2204       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2205       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2206       r = fac[c];
   2207       if (r) { // fast-AC path
   2208          k += (r >> 4) & 15; // run
   2209          s = r & 15; // combined length
   2210          j->code_buffer <<= s;
   2211          j->code_bits -= s;
   2212          // decode into unzigzag'd location
   2213          zig = stbi__jpeg_dezigzag[k++];
   2214          data[zig] = (short) ((r >> 8) * dequant[zig]);
   2215       } else {
   2216          int rs = stbi__jpeg_huff_decode(j, hac);
   2217          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2218          s = rs & 15;
   2219          r = rs >> 4;
   2220          if (s == 0) {
   2221             if (rs != 0xf0) break; // end block
   2222             k += 16;
   2223          } else {
   2224             k += r;
   2225             // decode into unzigzag'd location
   2226             zig = stbi__jpeg_dezigzag[k++];
   2227             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
   2228          }
   2229       }
   2230    } while (k < 64);
   2231    return 1;
   2232 }
   2233 
   2234 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
   2235 {
   2236    int diff,dc;
   2237    int t;
   2238    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2239 
   2240    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2241 
   2242    if (j->succ_high == 0) {
   2243       // first scan for DC coefficient, must be first
   2244       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
   2245       t = stbi__jpeg_huff_decode(j, hdc);
   2246       if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2247       diff = t ? stbi__extend_receive(j, t) : 0;
   2248 
   2249       dc = j->img_comp[b].dc_pred + diff;
   2250       j->img_comp[b].dc_pred = dc;
   2251       data[0] = (short) (dc * (1 << j->succ_low));
   2252    } else {
   2253       // refinement scan for DC coefficient
   2254       if (stbi__jpeg_get_bit(j))
   2255          data[0] += (short) (1 << j->succ_low);
   2256    }
   2257    return 1;
   2258 }
   2259 
   2260 // @OPTIMIZE: store non-zigzagged during the decode passes,
   2261 // and only de-zigzag when dequantizing
   2262 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
   2263 {
   2264    int k;
   2265    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2266 
   2267    if (j->succ_high == 0) {
   2268       int shift = j->succ_low;
   2269 
   2270       if (j->eob_run) {
   2271          --j->eob_run;
   2272          return 1;
   2273       }
   2274 
   2275       k = j->spec_start;
   2276       do {
   2277          unsigned int zig;
   2278          int c,r,s;
   2279          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2280          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2281          r = fac[c];
   2282          if (r) { // fast-AC path
   2283             k += (r >> 4) & 15; // run
   2284             s = r & 15; // combined length
   2285             j->code_buffer <<= s;
   2286             j->code_bits -= s;
   2287             zig = stbi__jpeg_dezigzag[k++];
   2288             data[zig] = (short) ((r >> 8) * (1 << shift));
   2289          } else {
   2290             int rs = stbi__jpeg_huff_decode(j, hac);
   2291             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2292             s = rs & 15;
   2293             r = rs >> 4;
   2294             if (s == 0) {
   2295                if (r < 15) {
   2296                   j->eob_run = (1 << r);
   2297                   if (r)
   2298                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2299                   --j->eob_run;
   2300                   break;
   2301                }
   2302                k += 16;
   2303             } else {
   2304                k += r;
   2305                zig = stbi__jpeg_dezigzag[k++];
   2306                data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
   2307             }
   2308          }
   2309       } while (k <= j->spec_end);
   2310    } else {
   2311       // refinement scan for these AC coefficients
   2312 
   2313       short bit = (short) (1 << j->succ_low);
   2314 
   2315       if (j->eob_run) {
   2316          --j->eob_run;
   2317          for (k = j->spec_start; k <= j->spec_end; ++k) {
   2318             short *p = &data[stbi__jpeg_dezigzag[k]];
   2319             if (*p != 0)
   2320                if (stbi__jpeg_get_bit(j))
   2321                   if ((*p & bit)==0) {
   2322                      if (*p > 0)
   2323                         *p += bit;
   2324                      else
   2325                         *p -= bit;
   2326                   }
   2327          }
   2328       } else {
   2329          k = j->spec_start;
   2330          do {
   2331             int r,s;
   2332             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
   2333             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2334             s = rs & 15;
   2335             r = rs >> 4;
   2336             if (s == 0) {
   2337                if (r < 15) {
   2338                   j->eob_run = (1 << r) - 1;
   2339                   if (r)
   2340                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2341                   r = 64; // force end of block
   2342                } else {
   2343                   // r=15 s=0 should write 16 0s, so we just do
   2344                   // a run of 15 0s and then write s (which is 0),
   2345                   // so we don't have to do anything special here
   2346                }
   2347             } else {
   2348                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
   2349                // sign bit
   2350                if (stbi__jpeg_get_bit(j))
   2351                   s = bit;
   2352                else
   2353                   s = -bit;
   2354             }
   2355 
   2356             // advance by r
   2357             while (k <= j->spec_end) {
   2358                short *p = &data[stbi__jpeg_dezigzag[k++]];
   2359                if (*p != 0) {
   2360                   if (stbi__jpeg_get_bit(j))
   2361                      if ((*p & bit)==0) {
   2362                         if (*p > 0)
   2363                            *p += bit;
   2364                         else
   2365                            *p -= bit;
   2366                      }
   2367                } else {
   2368                   if (r == 0) {
   2369                      *p = (short) s;
   2370                      break;
   2371                   }
   2372                   --r;
   2373                }
   2374             }
   2375          } while (k <= j->spec_end);
   2376       }
   2377    }
   2378    return 1;
   2379 }
   2380 
   2381 // take a -128..127 value and stbi__clamp it and convert to 0..255
   2382 stbi_inline static stbi_uc stbi__clamp(int x)
   2383 {
   2384    // trick to use a single test to catch both cases
   2385    if ((unsigned int) x > 255) {
   2386       if (x < 0) return 0;
   2387       if (x > 255) return 255;
   2388    }
   2389    return (stbi_uc) x;
   2390 }
   2391 
   2392 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
   2393 #define stbi__fsh(x)  ((x) * 4096)
   2394 
   2395 // derived from jidctint -- DCT_ISLOW
   2396 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
   2397    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
   2398    p2 = s2;                                    \
   2399    p3 = s6;                                    \
   2400    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
   2401    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
   2402    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
   2403    p2 = s0;                                    \
   2404    p3 = s4;                                    \
   2405    t0 = stbi__fsh(p2+p3);                      \
   2406    t1 = stbi__fsh(p2-p3);                      \
   2407    x0 = t0+t3;                                 \
   2408    x3 = t0-t3;                                 \
   2409    x1 = t1+t2;                                 \
   2410    x2 = t1-t2;                                 \
   2411    t0 = s7;                                    \
   2412    t1 = s5;                                    \
   2413    t2 = s3;                                    \
   2414    t3 = s1;                                    \
   2415    p3 = t0+t2;                                 \
   2416    p4 = t1+t3;                                 \
   2417    p1 = t0+t3;                                 \
   2418    p2 = t1+t2;                                 \
   2419    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
   2420    t0 = t0*stbi__f2f( 0.298631336f);           \
   2421    t1 = t1*stbi__f2f( 2.053119869f);           \
   2422    t2 = t2*stbi__f2f( 3.072711026f);           \
   2423    t3 = t3*stbi__f2f( 1.501321110f);           \
   2424    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
   2425    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
   2426    p3 = p3*stbi__f2f(-1.961570560f);           \
   2427    p4 = p4*stbi__f2f(-0.390180644f);           \
   2428    t3 += p1+p4;                                \
   2429    t2 += p2+p3;                                \
   2430    t1 += p2+p4;                                \
   2431    t0 += p1+p3;
   2432 
   2433 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
   2434 {
   2435    int i,val[64],*v=val;
   2436    stbi_uc *o;
   2437    short *d = data;
   2438 
   2439    // columns
   2440    for (i=0; i < 8; ++i,++d, ++v) {
   2441       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
   2442       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
   2443            && d[40]==0 && d[48]==0 && d[56]==0) {
   2444          //    no shortcut                 0     seconds
   2445          //    (1|2|3|4|5|6|7)==0          0     seconds
   2446          //    all separate               -0.047 seconds
   2447          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
   2448          int dcterm = d[0]*4;
   2449          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
   2450       } else {
   2451          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
   2452          // constants scaled things up by 1<<12; let's bring them back
   2453          // down, but keep 2 extra bits of precision
   2454          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
   2455          v[ 0] = (x0+t3) >> 10;
   2456          v[56] = (x0-t3) >> 10;
   2457          v[ 8] = (x1+t2) >> 10;
   2458          v[48] = (x1-t2) >> 10;
   2459          v[16] = (x2+t1) >> 10;
   2460          v[40] = (x2-t1) >> 10;
   2461          v[24] = (x3+t0) >> 10;
   2462          v[32] = (x3-t0) >> 10;
   2463       }
   2464    }
   2465 
   2466    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
   2467       // no fast case since the first 1D IDCT spread components out
   2468       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
   2469       // constants scaled things up by 1<<12, plus we had 1<<2 from first
   2470       // loop, plus horizontal and vertical each scale by sqrt(8) so together
   2471       // we've got an extra 1<<3, so 1<<17 total we need to remove.
   2472       // so we want to round that, which means adding 0.5 * 1<<17,
   2473       // aka 65536. Also, we'll end up with -128 to 127 that we want
   2474       // to encode as 0..255 by adding 128, so we'll add that before the shift
   2475       x0 += 65536 + (128<<17);
   2476       x1 += 65536 + (128<<17);
   2477       x2 += 65536 + (128<<17);
   2478       x3 += 65536 + (128<<17);
   2479       // tried computing the shifts into temps, or'ing the temps to see
   2480       // if any were out of range, but that was slower
   2481       o[0] = stbi__clamp((x0+t3) >> 17);
   2482       o[7] = stbi__clamp((x0-t3) >> 17);
   2483       o[1] = stbi__clamp((x1+t2) >> 17);
   2484       o[6] = stbi__clamp((x1-t2) >> 17);
   2485       o[2] = stbi__clamp((x2+t1) >> 17);
   2486       o[5] = stbi__clamp((x2-t1) >> 17);
   2487       o[3] = stbi__clamp((x3+t0) >> 17);
   2488       o[4] = stbi__clamp((x3-t0) >> 17);
   2489    }
   2490 }
   2491 
   2492 #ifdef STBI_SSE2
   2493 // sse2 integer IDCT. not the fastest possible implementation but it
   2494 // produces bit-identical results to the generic C version so it's
   2495 // fully "transparent".
   2496 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2497 {
   2498    // This is constructed to match our regular (generic) integer IDCT exactly.
   2499    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
   2500    __m128i tmp;
   2501 
   2502    // dot product constant: even elems=x, odd elems=y
   2503    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
   2504 
   2505    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
   2506    // out(1) = c1[even]*x + c1[odd]*y
   2507    #define dct_rot(out0,out1, x,y,c0,c1) \
   2508       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
   2509       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
   2510       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
   2511       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
   2512       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
   2513       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
   2514 
   2515    // out = in << 12  (in 16-bit, out 32-bit)
   2516    #define dct_widen(out, in) \
   2517       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
   2518       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
   2519 
   2520    // wide add
   2521    #define dct_wadd(out, a, b) \
   2522       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
   2523       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
   2524 
   2525    // wide sub
   2526    #define dct_wsub(out, a, b) \
   2527       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
   2528       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
   2529 
   2530    // butterfly a/b, add bias, then shift by "s" and pack
   2531    #define dct_bfly32o(out0, out1, a,b,bias,s) \
   2532       { \
   2533          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
   2534          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
   2535          dct_wadd(sum, abiased, b); \
   2536          dct_wsub(dif, abiased, b); \
   2537          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
   2538          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
   2539       }
   2540 
   2541    // 8-bit interleave step (for transposes)
   2542    #define dct_interleave8(a, b) \
   2543       tmp = a; \
   2544       a = _mm_unpacklo_epi8(a, b); \
   2545       b = _mm_unpackhi_epi8(tmp, b)
   2546 
   2547    // 16-bit interleave step (for transposes)
   2548    #define dct_interleave16(a, b) \
   2549       tmp = a; \
   2550       a = _mm_unpacklo_epi16(a, b); \
   2551       b = _mm_unpackhi_epi16(tmp, b)
   2552 
   2553    #define dct_pass(bias,shift) \
   2554       { \
   2555          /* even part */ \
   2556          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
   2557          __m128i sum04 = _mm_add_epi16(row0, row4); \
   2558          __m128i dif04 = _mm_sub_epi16(row0, row4); \
   2559          dct_widen(t0e, sum04); \
   2560          dct_widen(t1e, dif04); \
   2561          dct_wadd(x0, t0e, t3e); \
   2562          dct_wsub(x3, t0e, t3e); \
   2563          dct_wadd(x1, t1e, t2e); \
   2564          dct_wsub(x2, t1e, t2e); \
   2565          /* odd part */ \
   2566          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
   2567          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
   2568          __m128i sum17 = _mm_add_epi16(row1, row7); \
   2569          __m128i sum35 = _mm_add_epi16(row3, row5); \
   2570          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
   2571          dct_wadd(x4, y0o, y4o); \
   2572          dct_wadd(x5, y1o, y5o); \
   2573          dct_wadd(x6, y2o, y5o); \
   2574          dct_wadd(x7, y3o, y4o); \
   2575          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
   2576          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
   2577          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
   2578          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
   2579       }
   2580 
   2581    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
   2582    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
   2583    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
   2584    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
   2585    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
   2586    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
   2587    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
   2588    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
   2589 
   2590    // rounding biases in column/row passes, see stbi__idct_block for explanation.
   2591    __m128i bias_0 = _mm_set1_epi32(512);
   2592    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
   2593 
   2594    // load
   2595    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
   2596    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
   2597    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
   2598    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
   2599    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
   2600    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
   2601    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
   2602    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
   2603 
   2604    // column pass
   2605    dct_pass(bias_0, 10);
   2606 
   2607    {
   2608       // 16bit 8x8 transpose pass 1
   2609       dct_interleave16(row0, row4);
   2610       dct_interleave16(row1, row5);
   2611       dct_interleave16(row2, row6);
   2612       dct_interleave16(row3, row7);
   2613 
   2614       // transpose pass 2
   2615       dct_interleave16(row0, row2);
   2616       dct_interleave16(row1, row3);
   2617       dct_interleave16(row4, row6);
   2618       dct_interleave16(row5, row7);
   2619 
   2620       // transpose pass 3
   2621       dct_interleave16(row0, row1);
   2622       dct_interleave16(row2, row3);
   2623       dct_interleave16(row4, row5);
   2624       dct_interleave16(row6, row7);
   2625    }
   2626 
   2627    // row pass
   2628    dct_pass(bias_1, 17);
   2629 
   2630    {
   2631       // pack
   2632       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
   2633       __m128i p1 = _mm_packus_epi16(row2, row3);
   2634       __m128i p2 = _mm_packus_epi16(row4, row5);
   2635       __m128i p3 = _mm_packus_epi16(row6, row7);
   2636 
   2637       // 8bit 8x8 transpose pass 1
   2638       dct_interleave8(p0, p2); // a0e0a1e1...
   2639       dct_interleave8(p1, p3); // c0g0c1g1...
   2640 
   2641       // transpose pass 2
   2642       dct_interleave8(p0, p1); // a0c0e0g0...
   2643       dct_interleave8(p2, p3); // b0d0f0h0...
   2644 
   2645       // transpose pass 3
   2646       dct_interleave8(p0, p2); // a0b0c0d0...
   2647       dct_interleave8(p1, p3); // a4b4c4d4...
   2648 
   2649       // store
   2650       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
   2651       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
   2652       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
   2653       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
   2654       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
   2655       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
   2656       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
   2657       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
   2658    }
   2659 
   2660 #undef dct_const
   2661 #undef dct_rot
   2662 #undef dct_widen
   2663 #undef dct_wadd
   2664 #undef dct_wsub
   2665 #undef dct_bfly32o
   2666 #undef dct_interleave8
   2667 #undef dct_interleave16
   2668 #undef dct_pass
   2669 }
   2670 
   2671 #endif // STBI_SSE2
   2672 
   2673 #ifdef STBI_NEON
   2674 
   2675 // NEON integer IDCT. should produce bit-identical
   2676 // results to the generic C version.
   2677 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2678 {
   2679    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
   2680 
   2681    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
   2682    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
   2683    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
   2684    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
   2685    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
   2686    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
   2687    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
   2688    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
   2689    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
   2690    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
   2691    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
   2692    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
   2693 
   2694 #define dct_long_mul(out, inq, coeff) \
   2695    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
   2696    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
   2697 
   2698 #define dct_long_mac(out, acc, inq, coeff) \
   2699    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
   2700    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
   2701 
   2702 #define dct_widen(out, inq) \
   2703    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
   2704    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
   2705 
   2706 // wide add
   2707 #define dct_wadd(out, a, b) \
   2708    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
   2709    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
   2710 
   2711 // wide sub
   2712 #define dct_wsub(out, a, b) \
   2713    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
   2714    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
   2715 
   2716 // butterfly a/b, then shift using "shiftop" by "s" and pack
   2717 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
   2718    { \
   2719       dct_wadd(sum, a, b); \
   2720       dct_wsub(dif, a, b); \
   2721       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
   2722       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
   2723    }
   2724 
   2725 #define dct_pass(shiftop, shift) \
   2726    { \
   2727       /* even part */ \
   2728       int16x8_t sum26 = vaddq_s16(row2, row6); \
   2729       dct_long_mul(p1e, sum26, rot0_0); \
   2730       dct_long_mac(t2e, p1e, row6, rot0_1); \
   2731       dct_long_mac(t3e, p1e, row2, rot0_2); \
   2732       int16x8_t sum04 = vaddq_s16(row0, row4); \
   2733       int16x8_t dif04 = vsubq_s16(row0, row4); \
   2734       dct_widen(t0e, sum04); \
   2735       dct_widen(t1e, dif04); \
   2736       dct_wadd(x0, t0e, t3e); \
   2737       dct_wsub(x3, t0e, t3e); \
   2738       dct_wadd(x1, t1e, t2e); \
   2739       dct_wsub(x2, t1e, t2e); \
   2740       /* odd part */ \
   2741       int16x8_t sum15 = vaddq_s16(row1, row5); \
   2742       int16x8_t sum17 = vaddq_s16(row1, row7); \
   2743       int16x8_t sum35 = vaddq_s16(row3, row5); \
   2744       int16x8_t sum37 = vaddq_s16(row3, row7); \
   2745       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
   2746       dct_long_mul(p5o, sumodd, rot1_0); \
   2747       dct_long_mac(p1o, p5o, sum17, rot1_1); \
   2748       dct_long_mac(p2o, p5o, sum35, rot1_2); \
   2749       dct_long_mul(p3o, sum37, rot2_0); \
   2750       dct_long_mul(p4o, sum15, rot2_1); \
   2751       dct_wadd(sump13o, p1o, p3o); \
   2752       dct_wadd(sump24o, p2o, p4o); \
   2753       dct_wadd(sump23o, p2o, p3o); \
   2754       dct_wadd(sump14o, p1o, p4o); \
   2755       dct_long_mac(x4, sump13o, row7, rot3_0); \
   2756       dct_long_mac(x5, sump24o, row5, rot3_1); \
   2757       dct_long_mac(x6, sump23o, row3, rot3_2); \
   2758       dct_long_mac(x7, sump14o, row1, rot3_3); \
   2759       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
   2760       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
   2761       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
   2762       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
   2763    }
   2764 
   2765    // load
   2766    row0 = vld1q_s16(data + 0*8);
   2767    row1 = vld1q_s16(data + 1*8);
   2768    row2 = vld1q_s16(data + 2*8);
   2769    row3 = vld1q_s16(data + 3*8);
   2770    row4 = vld1q_s16(data + 4*8);
   2771    row5 = vld1q_s16(data + 5*8);
   2772    row6 = vld1q_s16(data + 6*8);
   2773    row7 = vld1q_s16(data + 7*8);
   2774 
   2775    // add DC bias
   2776    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
   2777 
   2778    // column pass
   2779    dct_pass(vrshrn_n_s32, 10);
   2780 
   2781    // 16bit 8x8 transpose
   2782    {
   2783 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
   2784 // whether compilers actually get this is another story, sadly.
   2785 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
   2786 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
   2787 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
   2788 
   2789       // pass 1
   2790       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
   2791       dct_trn16(row2, row3);
   2792       dct_trn16(row4, row5);
   2793       dct_trn16(row6, row7);
   2794 
   2795       // pass 2
   2796       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
   2797       dct_trn32(row1, row3);
   2798       dct_trn32(row4, row6);
   2799       dct_trn32(row5, row7);
   2800 
   2801       // pass 3
   2802       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
   2803       dct_trn64(row1, row5);
   2804       dct_trn64(row2, row6);
   2805       dct_trn64(row3, row7);
   2806 
   2807 #undef dct_trn16
   2808 #undef dct_trn32
   2809 #undef dct_trn64
   2810    }
   2811 
   2812    // row pass
   2813    // vrshrn_n_s32 only supports shifts up to 16, we need
   2814    // 17. so do a non-rounding shift of 16 first then follow
   2815    // up with a rounding shift by 1.
   2816    dct_pass(vshrn_n_s32, 16);
   2817 
   2818    {
   2819       // pack and round
   2820       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
   2821       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
   2822       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
   2823       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
   2824       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
   2825       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
   2826       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
   2827       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
   2828 
   2829       // again, these can translate into one instruction, but often don't.
   2830 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
   2831 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
   2832 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
   2833 
   2834       // sadly can't use interleaved stores here since we only write
   2835       // 8 bytes to each scan line!
   2836 
   2837       // 8x8 8-bit transpose pass 1
   2838       dct_trn8_8(p0, p1);
   2839       dct_trn8_8(p2, p3);
   2840       dct_trn8_8(p4, p5);
   2841       dct_trn8_8(p6, p7);
   2842 
   2843       // pass 2
   2844       dct_trn8_16(p0, p2);
   2845       dct_trn8_16(p1, p3);
   2846       dct_trn8_16(p4, p6);
   2847       dct_trn8_16(p5, p7);
   2848 
   2849       // pass 3
   2850       dct_trn8_32(p0, p4);
   2851       dct_trn8_32(p1, p5);
   2852       dct_trn8_32(p2, p6);
   2853       dct_trn8_32(p3, p7);
   2854 
   2855       // store
   2856       vst1_u8(out, p0); out += out_stride;
   2857       vst1_u8(out, p1); out += out_stride;
   2858       vst1_u8(out, p2); out += out_stride;
   2859       vst1_u8(out, p3); out += out_stride;
   2860       vst1_u8(out, p4); out += out_stride;
   2861       vst1_u8(out, p5); out += out_stride;
   2862       vst1_u8(out, p6); out += out_stride;
   2863       vst1_u8(out, p7);
   2864 
   2865 #undef dct_trn8_8
   2866 #undef dct_trn8_16
   2867 #undef dct_trn8_32
   2868    }
   2869 
   2870 #undef dct_long_mul
   2871 #undef dct_long_mac
   2872 #undef dct_widen
   2873 #undef dct_wadd
   2874 #undef dct_wsub
   2875 #undef dct_bfly32o
   2876 #undef dct_pass
   2877 }
   2878 
   2879 #endif // STBI_NEON
   2880 
   2881 #define STBI__MARKER_none  0xff
   2882 // if there's a pending marker from the entropy stream, return that
   2883 // otherwise, fetch from the stream and get a marker. if there's no
   2884 // marker, return 0xff, which is never a valid marker value
   2885 static stbi_uc stbi__get_marker(stbi__jpeg *j)
   2886 {
   2887    stbi_uc x;
   2888    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
   2889    x = stbi__get8(j->s);
   2890    if (x != 0xff) return STBI__MARKER_none;
   2891    while (x == 0xff)
   2892       x = stbi__get8(j->s); // consume repeated 0xff fill bytes
   2893    return x;
   2894 }
   2895 
   2896 // in each scan, we'll have scan_n components, and the order
   2897 // of the components is specified by order[]
   2898 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
   2899 
   2900 // after a restart interval, stbi__jpeg_reset the entropy decoder and
   2901 // the dc prediction
   2902 static void stbi__jpeg_reset(stbi__jpeg *j)
   2903 {
   2904    j->code_bits = 0;
   2905    j->code_buffer = 0;
   2906    j->nomore = 0;
   2907    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
   2908    j->marker = STBI__MARKER_none;
   2909    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
   2910    j->eob_run = 0;
   2911    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
   2912    // since we don't even allow 1<<30 pixels
   2913 }
   2914 
   2915 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
   2916 {
   2917    stbi__jpeg_reset(z);
   2918    if (!z->progressive) {
   2919       if (z->scan_n == 1) {
   2920          int i,j;
   2921          STBI_SIMD_ALIGN(short, data[64]);
   2922          int n = z->order[0];
   2923          // non-interleaved data, we just need to process one block at a time,
   2924          // in trivial scanline order
   2925          // number of blocks to do just depends on how many actual "pixels" this
   2926          // component has, independent of interleaved MCU blocking and such
   2927          int w = (z->img_comp[n].x+7) >> 3;
   2928          int h = (z->img_comp[n].y+7) >> 3;
   2929          for (j=0; j < h; ++j) {
   2930             for (i=0; i < w; ++i) {
   2931                int ha = z->img_comp[n].ha;
   2932                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2933                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2934                // every data block is an MCU, so countdown the restart interval
   2935                if (--z->todo <= 0) {
   2936                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2937                   // if it's NOT a restart, then just bail, so we get corrupt data
   2938                   // rather than no data
   2939                   if (!STBI__RESTART(z->marker)) return 1;
   2940                   stbi__jpeg_reset(z);
   2941                }
   2942             }
   2943          }
   2944          return 1;
   2945       } else { // interleaved
   2946          int i,j,k,x,y;
   2947          STBI_SIMD_ALIGN(short, data[64]);
   2948          for (j=0; j < z->img_mcu_y; ++j) {
   2949             for (i=0; i < z->img_mcu_x; ++i) {
   2950                // scan an interleaved mcu... process scan_n components in order
   2951                for (k=0; k < z->scan_n; ++k) {
   2952                   int n = z->order[k];
   2953                   // scan out an mcu's worth of this component; that's just determined
   2954                   // by the basic H and V specified for the component
   2955                   for (y=0; y < z->img_comp[n].v; ++y) {
   2956                      for (x=0; x < z->img_comp[n].h; ++x) {
   2957                         int x2 = (i*z->img_comp[n].h + x)*8;
   2958                         int y2 = (j*z->img_comp[n].v + y)*8;
   2959                         int ha = z->img_comp[n].ha;
   2960                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2961                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
   2962                      }
   2963                   }
   2964                }
   2965                // after all interleaved components, that's an interleaved MCU,
   2966                // so now count down the restart interval
   2967                if (--z->todo <= 0) {
   2968                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2969                   if (!STBI__RESTART(z->marker)) return 1;
   2970                   stbi__jpeg_reset(z);
   2971                }
   2972             }
   2973          }
   2974          return 1;
   2975       }
   2976    } else {
   2977       if (z->scan_n == 1) {
   2978          int i,j;
   2979          int n = z->order[0];
   2980          // non-interleaved data, we just need to process one block at a time,
   2981          // in trivial scanline order
   2982          // number of blocks to do just depends on how many actual "pixels" this
   2983          // component has, independent of interleaved MCU blocking and such
   2984          int w = (z->img_comp[n].x+7) >> 3;
   2985          int h = (z->img_comp[n].y+7) >> 3;
   2986          for (j=0; j < h; ++j) {
   2987             for (i=0; i < w; ++i) {
   2988                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2989                if (z->spec_start == 0) {
   2990                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2991                      return 0;
   2992                } else {
   2993                   int ha = z->img_comp[n].ha;
   2994                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
   2995                      return 0;
   2996                }
   2997                // every data block is an MCU, so countdown the restart interval
   2998                if (--z->todo <= 0) {
   2999                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   3000                   if (!STBI__RESTART(z->marker)) return 1;
   3001                   stbi__jpeg_reset(z);
   3002                }
   3003             }
   3004          }
   3005          return 1;
   3006       } else { // interleaved
   3007          int i,j,k,x,y;
   3008          for (j=0; j < z->img_mcu_y; ++j) {
   3009             for (i=0; i < z->img_mcu_x; ++i) {
   3010                // scan an interleaved mcu... process scan_n components in order
   3011                for (k=0; k < z->scan_n; ++k) {
   3012                   int n = z->order[k];
   3013                   // scan out an mcu's worth of this component; that's just determined
   3014                   // by the basic H and V specified for the component
   3015                   for (y=0; y < z->img_comp[n].v; ++y) {
   3016                      for (x=0; x < z->img_comp[n].h; ++x) {
   3017                         int x2 = (i*z->img_comp[n].h + x);
   3018                         int y2 = (j*z->img_comp[n].v + y);
   3019                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
   3020                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   3021                            return 0;
   3022                      }
   3023                   }
   3024                }
   3025                // after all interleaved components, that's an interleaved MCU,
   3026                // so now count down the restart interval
   3027                if (--z->todo <= 0) {
   3028                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   3029                   if (!STBI__RESTART(z->marker)) return 1;
   3030                   stbi__jpeg_reset(z);
   3031                }
   3032             }
   3033          }
   3034          return 1;
   3035       }
   3036    }
   3037 }
   3038 
   3039 static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
   3040 {
   3041    int i;
   3042    for (i=0; i < 64; ++i)
   3043       data[i] *= dequant[i];
   3044 }
   3045 
   3046 static void stbi__jpeg_finish(stbi__jpeg *z)
   3047 {
   3048    if (z->progressive) {
   3049       // dequantize and idct the data
   3050       int i,j,n;
   3051       for (n=0; n < z->s->img_n; ++n) {
   3052          int w = (z->img_comp[n].x+7) >> 3;
   3053          int h = (z->img_comp[n].y+7) >> 3;
   3054          for (j=0; j < h; ++j) {
   3055             for (i=0; i < w; ++i) {
   3056                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   3057                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
   3058                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   3059             }
   3060          }
   3061       }
   3062    }
   3063 }
   3064 
   3065 static int stbi__process_marker(stbi__jpeg *z, int m)
   3066 {
   3067    int L;
   3068    switch (m) {
   3069       case STBI__MARKER_none: // no marker found
   3070          return stbi__err("expected marker","Corrupt JPEG");
   3071 
   3072       case 0xDD: // DRI - specify restart interval
   3073          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
   3074          z->restart_interval = stbi__get16be(z->s);
   3075          return 1;
   3076 
   3077       case 0xDB: // DQT - define quantization table
   3078          L = stbi__get16be(z->s)-2;
   3079          while (L > 0) {
   3080             int q = stbi__get8(z->s);
   3081             int p = q >> 4, sixteen = (p != 0);
   3082             int t = q & 15,i;
   3083             if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
   3084             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
   3085 
   3086             for (i=0; i < 64; ++i)
   3087                z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
   3088             L -= (sixteen ? 129 : 65);
   3089          }
   3090          return L==0;
   3091 
   3092       case 0xC4: // DHT - define huffman table
   3093          L = stbi__get16be(z->s)-2;
   3094          while (L > 0) {
   3095             stbi_uc *v;
   3096             int sizes[16],i,n=0;
   3097             int q = stbi__get8(z->s);
   3098             int tc = q >> 4;
   3099             int th = q & 15;
   3100             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
   3101             for (i=0; i < 16; ++i) {
   3102                sizes[i] = stbi__get8(z->s);
   3103                n += sizes[i];
   3104             }
   3105             L -= 17;
   3106             if (tc == 0) {
   3107                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
   3108                v = z->huff_dc[th].values;
   3109             } else {
   3110                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
   3111                v = z->huff_ac[th].values;
   3112             }
   3113             for (i=0; i < n; ++i)
   3114                v[i] = stbi__get8(z->s);
   3115             if (tc != 0)
   3116                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
   3117             L -= n;
   3118          }
   3119          return L==0;
   3120    }
   3121 
   3122    // check for comment block or APP blocks
   3123    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
   3124       L = stbi__get16be(z->s);
   3125       if (L < 2) {
   3126          if (m == 0xFE)
   3127             return stbi__err("bad COM len","Corrupt JPEG");
   3128          else
   3129             return stbi__err("bad APP len","Corrupt JPEG");
   3130       }
   3131       L -= 2;
   3132 
   3133       if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
   3134          static const unsigned char tag[5] = {'J','F','I','F','\0'};
   3135          int ok = 1;
   3136          int i;
   3137          for (i=0; i < 5; ++i)
   3138             if (stbi__get8(z->s) != tag[i])
   3139                ok = 0;
   3140          L -= 5;
   3141          if (ok)
   3142             z->jfif = 1;
   3143       } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
   3144          static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
   3145          int ok = 1;
   3146          int i;
   3147          for (i=0; i < 6; ++i)
   3148             if (stbi__get8(z->s) != tag[i])
   3149                ok = 0;
   3150          L -= 6;
   3151          if (ok) {
   3152             stbi__get8(z->s); // version
   3153             stbi__get16be(z->s); // flags0
   3154             stbi__get16be(z->s); // flags1
   3155             z->app14_color_transform = stbi__get8(z->s); // color transform
   3156             L -= 6;
   3157          }
   3158       }
   3159 
   3160       stbi__skip(z->s, L);
   3161       return 1;
   3162    }
   3163 
   3164    return stbi__err("unknown marker","Corrupt JPEG");
   3165 }
   3166 
   3167 // after we see SOS
   3168 static int stbi__process_scan_header(stbi__jpeg *z)
   3169 {
   3170    int i;
   3171    int Ls = stbi__get16be(z->s);
   3172    z->scan_n = stbi__get8(z->s);
   3173    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
   3174    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
   3175    for (i=0; i < z->scan_n; ++i) {
   3176       int id = stbi__get8(z->s), which;
   3177       int q = stbi__get8(z->s);
   3178       for (which = 0; which < z->s->img_n; ++which)
   3179          if (z->img_comp[which].id == id)
   3180             break;
   3181       if (which == z->s->img_n) return 0; // no match
   3182       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
   3183       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
   3184       z->order[i] = which;
   3185    }
   3186 
   3187    {
   3188       int aa;
   3189       z->spec_start = stbi__get8(z->s);
   3190       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
   3191       aa = stbi__get8(z->s);
   3192       z->succ_high = (aa >> 4);
   3193       z->succ_low  = (aa & 15);
   3194       if (z->progressive) {
   3195          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
   3196             return stbi__err("bad SOS", "Corrupt JPEG");
   3197       } else {
   3198          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
   3199          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
   3200          z->spec_end = 63;
   3201       }
   3202    }
   3203 
   3204    return 1;
   3205 }
   3206 
   3207 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
   3208 {
   3209    int i;
   3210    for (i=0; i < ncomp; ++i) {
   3211       if (z->img_comp[i].raw_data) {
   3212          STBI_FREE(z->img_comp[i].raw_data);
   3213          z->img_comp[i].raw_data = NULL;
   3214          z->img_comp[i].data = NULL;
   3215       }
   3216       if (z->img_comp[i].raw_coeff) {
   3217          STBI_FREE(z->img_comp[i].raw_coeff);
   3218          z->img_comp[i].raw_coeff = 0;
   3219          z->img_comp[i].coeff = 0;
   3220       }
   3221       if (z->img_comp[i].linebuf) {
   3222          STBI_FREE(z->img_comp[i].linebuf);
   3223          z->img_comp[i].linebuf = NULL;
   3224       }
   3225    }
   3226    return why;
   3227 }
   3228 
   3229 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
   3230 {
   3231    stbi__context *s = z->s;
   3232    int Lf,p,i,q, h_max=1,v_max=1,c;
   3233    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
   3234    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
   3235    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
   3236    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
   3237    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   3238    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   3239    c = stbi__get8(s);
   3240    if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
   3241    s->img_n = c;
   3242    for (i=0; i < c; ++i) {
   3243       z->img_comp[i].data = NULL;
   3244       z->img_comp[i].linebuf = NULL;
   3245    }
   3246 
   3247    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
   3248 
   3249    z->rgb = 0;
   3250    for (i=0; i < s->img_n; ++i) {
   3251       static const unsigned char rgb[3] = { 'R', 'G', 'B' };
   3252       z->img_comp[i].id = stbi__get8(s);
   3253       if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
   3254          ++z->rgb;
   3255       q = stbi__get8(s);
   3256       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
   3257       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
   3258       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
   3259    }
   3260 
   3261    if (scan != STBI__SCAN_load) return 1;
   3262 
   3263    if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
   3264 
   3265    for (i=0; i < s->img_n; ++i) {
   3266       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
   3267       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
   3268    }
   3269 
   3270    // compute interleaved mcu info
   3271    z->img_h_max = h_max;
   3272    z->img_v_max = v_max;
   3273    z->img_mcu_w = h_max * 8;
   3274    z->img_mcu_h = v_max * 8;
   3275    // these sizes can't be more than 17 bits
   3276    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
   3277    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
   3278 
   3279    for (i=0; i < s->img_n; ++i) {
   3280       // number of effective pixels (e.g. for non-interleaved MCU)
   3281       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
   3282       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
   3283       // to simplify generation, we'll allocate enough memory to decode
   3284       // the bogus oversized data from using interleaved MCUs and their
   3285       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
   3286       // discard the extra data until colorspace conversion
   3287       //
   3288       // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
   3289       // so these muls can't overflow with 32-bit ints (which we require)
   3290       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
   3291       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
   3292       z->img_comp[i].coeff = 0;
   3293       z->img_comp[i].raw_coeff = 0;
   3294       z->img_comp[i].linebuf = NULL;
   3295       z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
   3296       if (z->img_comp[i].raw_data == NULL)
   3297          return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3298       // align blocks for idct using mmx/sse
   3299       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
   3300       if (z->progressive) {
   3301          // w2, h2 are multiples of 8 (see above)
   3302          z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
   3303          z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
   3304          z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
   3305          if (z->img_comp[i].raw_coeff == NULL)
   3306             return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3307          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
   3308       }
   3309    }
   3310 
   3311    return 1;
   3312 }
   3313 
   3314 // use comparisons since in some cases we handle more than one case (e.g. SOF)
   3315 #define stbi__DNL(x)         ((x) == 0xdc)
   3316 #define stbi__SOI(x)         ((x) == 0xd8)
   3317 #define stbi__EOI(x)         ((x) == 0xd9)
   3318 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
   3319 #define stbi__SOS(x)         ((x) == 0xda)
   3320 
   3321 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
   3322 
   3323 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
   3324 {
   3325    int m;
   3326    z->jfif = 0;
   3327    z->app14_color_transform = -1; // valid values are 0,1,2
   3328    z->marker = STBI__MARKER_none; // initialize cached marker to empty
   3329    m = stbi__get_marker(z);
   3330    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
   3331    if (scan == STBI__SCAN_type) return 1;
   3332    m = stbi__get_marker(z);
   3333    while (!stbi__SOF(m)) {
   3334       if (!stbi__process_marker(z,m)) return 0;
   3335       m = stbi__get_marker(z);
   3336       while (m == STBI__MARKER_none) {
   3337          // some files have extra padding after their blocks, so ok, we'll scan
   3338          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
   3339          m = stbi__get_marker(z);
   3340       }
   3341    }
   3342    z->progressive = stbi__SOF_progressive(m);
   3343    if (!stbi__process_frame_header(z, scan)) return 0;
   3344    return 1;
   3345 }
   3346 
   3347 // decode image to YCbCr format
   3348 static int stbi__decode_jpeg_image(stbi__jpeg *j)
   3349 {
   3350    int m;
   3351    for (m = 0; m < 4; m++) {
   3352       j->img_comp[m].raw_data = NULL;
   3353       j->img_comp[m].raw_coeff = NULL;
   3354    }
   3355    j->restart_interval = 0;
   3356    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
   3357    m = stbi__get_marker(j);
   3358    while (!stbi__EOI(m)) {
   3359       if (stbi__SOS(m)) {
   3360          if (!stbi__process_scan_header(j)) return 0;
   3361          if (!stbi__parse_entropy_coded_data(j)) return 0;
   3362          if (j->marker == STBI__MARKER_none ) {
   3363             // handle 0s at the end of image data from IP Kamera 9060
   3364             while (!stbi__at_eof(j->s)) {
   3365                int x = stbi__get8(j->s);
   3366                if (x == 255) {
   3367                   j->marker = stbi__get8(j->s);
   3368                   break;
   3369                }
   3370             }
   3371             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
   3372          }
   3373       } else if (stbi__DNL(m)) {
   3374          int Ld = stbi__get16be(j->s);
   3375          stbi__uint32 NL = stbi__get16be(j->s);
   3376          if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
   3377          if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
   3378       } else {
   3379          if (!stbi__process_marker(j, m)) return 0;
   3380       }
   3381       m = stbi__get_marker(j);
   3382    }
   3383    if (j->progressive)
   3384       stbi__jpeg_finish(j);
   3385    return 1;
   3386 }
   3387 
   3388 // static jfif-centered resampling (across block boundaries)
   3389 
   3390 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
   3391                                     int w, int hs);
   3392 
   3393 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
   3394 
   3395 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3396 {
   3397    STBI_NOTUSED(out);
   3398    STBI_NOTUSED(in_far);
   3399    STBI_NOTUSED(w);
   3400    STBI_NOTUSED(hs);
   3401    return in_near;
   3402 }
   3403 
   3404 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3405 {
   3406    // need to generate two samples vertically for every one in input
   3407    int i;
   3408    STBI_NOTUSED(hs);
   3409    for (i=0; i < w; ++i)
   3410       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
   3411    return out;
   3412 }
   3413 
   3414 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3415 {
   3416    // need to generate two samples horizontally for every one in input
   3417    int i;
   3418    stbi_uc *input = in_near;
   3419 
   3420    if (w == 1) {
   3421       // if only one sample, can't do any interpolation
   3422       out[0] = out[1] = input[0];
   3423       return out;
   3424    }
   3425 
   3426    out[0] = input[0];
   3427    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
   3428    for (i=1; i < w-1; ++i) {
   3429       int n = 3*input[i]+2;
   3430       out[i*2+0] = stbi__div4(n+input[i-1]);
   3431       out[i*2+1] = stbi__div4(n+input[i+1]);
   3432    }
   3433    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
   3434    out[i*2+1] = input[w-1];
   3435 
   3436    STBI_NOTUSED(in_far);
   3437    STBI_NOTUSED(hs);
   3438 
   3439    return out;
   3440 }
   3441 
   3442 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
   3443 
   3444 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3445 {
   3446    // need to generate 2x2 samples for every one in input
   3447    int i,t0,t1;
   3448    if (w == 1) {
   3449       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3450       return out;
   3451    }
   3452 
   3453    t1 = 3*in_near[0] + in_far[0];
   3454    out[0] = stbi__div4(t1+2);
   3455    for (i=1; i < w; ++i) {
   3456       t0 = t1;
   3457       t1 = 3*in_near[i]+in_far[i];
   3458       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3459       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3460    }
   3461    out[w*2-1] = stbi__div4(t1+2);
   3462 
   3463    STBI_NOTUSED(hs);
   3464 
   3465    return out;
   3466 }
   3467 
   3468 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3469 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3470 {
   3471    // need to generate 2x2 samples for every one in input
   3472    int i=0,t0,t1;
   3473 
   3474    if (w == 1) {
   3475       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3476       return out;
   3477    }
   3478 
   3479    t1 = 3*in_near[0] + in_far[0];
   3480    // process groups of 8 pixels for as long as we can.
   3481    // note we can't handle the last pixel in a row in this loop
   3482    // because we need to handle the filter boundary conditions.
   3483    for (; i < ((w-1) & ~7); i += 8) {
   3484 #if defined(STBI_SSE2)
   3485       // load and perform the vertical filtering pass
   3486       // this uses 3*x + y = 4*x + (y - x)
   3487       __m128i zero  = _mm_setzero_si128();
   3488       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
   3489       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
   3490       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
   3491       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
   3492       __m128i diff  = _mm_sub_epi16(farw, nearw);
   3493       __m128i nears = _mm_slli_epi16(nearw, 2);
   3494       __m128i curr  = _mm_add_epi16(nears, diff); // current row
   3495 
   3496       // horizontal filter works the same based on shifted vers of current
   3497       // row. "prev" is current row shifted right by 1 pixel; we need to
   3498       // insert the previous pixel value (from t1).
   3499       // "next" is current row shifted left by 1 pixel, with first pixel
   3500       // of next block of 8 pixels added in.
   3501       __m128i prv0 = _mm_slli_si128(curr, 2);
   3502       __m128i nxt0 = _mm_srli_si128(curr, 2);
   3503       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
   3504       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
   3505 
   3506       // horizontal filter, polyphase implementation since it's convenient:
   3507       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3508       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3509       // note the shared term.
   3510       __m128i bias  = _mm_set1_epi16(8);
   3511       __m128i curs = _mm_slli_epi16(curr, 2);
   3512       __m128i prvd = _mm_sub_epi16(prev, curr);
   3513       __m128i nxtd = _mm_sub_epi16(next, curr);
   3514       __m128i curb = _mm_add_epi16(curs, bias);
   3515       __m128i even = _mm_add_epi16(prvd, curb);
   3516       __m128i odd  = _mm_add_epi16(nxtd, curb);
   3517 
   3518       // interleave even and odd pixels, then undo scaling.
   3519       __m128i int0 = _mm_unpacklo_epi16(even, odd);
   3520       __m128i int1 = _mm_unpackhi_epi16(even, odd);
   3521       __m128i de0  = _mm_srli_epi16(int0, 4);
   3522       __m128i de1  = _mm_srli_epi16(int1, 4);
   3523 
   3524       // pack and write output
   3525       __m128i outv = _mm_packus_epi16(de0, de1);
   3526       _mm_storeu_si128((__m128i *) (out + i*2), outv);
   3527 #elif defined(STBI_NEON)
   3528       // load and perform the vertical filtering pass
   3529       // this uses 3*x + y = 4*x + (y - x)
   3530       uint8x8_t farb  = vld1_u8(in_far + i);
   3531       uint8x8_t nearb = vld1_u8(in_near + i);
   3532       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
   3533       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
   3534       int16x8_t curr  = vaddq_s16(nears, diff); // current row
   3535 
   3536       // horizontal filter works the same based on shifted vers of current
   3537       // row. "prev" is current row shifted right by 1 pixel; we need to
   3538       // insert the previous pixel value (from t1).
   3539       // "next" is current row shifted left by 1 pixel, with first pixel
   3540       // of next block of 8 pixels added in.
   3541       int16x8_t prv0 = vextq_s16(curr, curr, 7);
   3542       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
   3543       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
   3544       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
   3545 
   3546       // horizontal filter, polyphase implementation since it's convenient:
   3547       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3548       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3549       // note the shared term.
   3550       int16x8_t curs = vshlq_n_s16(curr, 2);
   3551       int16x8_t prvd = vsubq_s16(prev, curr);
   3552       int16x8_t nxtd = vsubq_s16(next, curr);
   3553       int16x8_t even = vaddq_s16(curs, prvd);
   3554       int16x8_t odd  = vaddq_s16(curs, nxtd);
   3555 
   3556       // undo scaling and round, then store with even/odd phases interleaved
   3557       uint8x8x2_t o;
   3558       o.val[0] = vqrshrun_n_s16(even, 4);
   3559       o.val[1] = vqrshrun_n_s16(odd,  4);
   3560       vst2_u8(out + i*2, o);
   3561 #endif
   3562 
   3563       // "previous" value for next iter
   3564       t1 = 3*in_near[i+7] + in_far[i+7];
   3565    }
   3566 
   3567    t0 = t1;
   3568    t1 = 3*in_near[i] + in_far[i];
   3569    out[i*2] = stbi__div16(3*t1 + t0 + 8);
   3570 
   3571    for (++i; i < w; ++i) {
   3572       t0 = t1;
   3573       t1 = 3*in_near[i]+in_far[i];
   3574       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3575       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3576    }
   3577    out[w*2-1] = stbi__div4(t1+2);
   3578 
   3579    STBI_NOTUSED(hs);
   3580 
   3581    return out;
   3582 }
   3583 #endif
   3584 
   3585 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3586 {
   3587    // resample with nearest-neighbor
   3588    int i,j;
   3589    STBI_NOTUSED(in_far);
   3590    for (i=0; i < w; ++i)
   3591       for (j=0; j < hs; ++j)
   3592          out[i*hs+j] = in_near[i];
   3593    return out;
   3594 }
   3595 
   3596 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
   3597 // to make sure the code produces the same results in both SIMD and scalar
   3598 #define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
   3599 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3600 {
   3601    int i;
   3602    for (i=0; i < count; ++i) {
   3603       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3604       int r,g,b;
   3605       int cr = pcr[i] - 128;
   3606       int cb = pcb[i] - 128;
   3607       r = y_fixed +  cr* stbi__float2fixed(1.40200f);
   3608       g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3609       b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
   3610       r >>= 20;
   3611       g >>= 20;
   3612       b >>= 20;
   3613       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3614       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3615       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3616       out[0] = (stbi_uc)r;
   3617       out[1] = (stbi_uc)g;
   3618       out[2] = (stbi_uc)b;
   3619       out[3] = 255;
   3620       out += step;
   3621    }
   3622 }
   3623 
   3624 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3625 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
   3626 {
   3627    int i = 0;
   3628 
   3629 #ifdef STBI_SSE2
   3630    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
   3631    // it's useful in practice (you wouldn't use it for textures, for example).
   3632    // so just accelerate step == 4 case.
   3633    if (step == 4) {
   3634       // this is a fairly straightforward implementation and not super-optimized.
   3635       __m128i signflip  = _mm_set1_epi8(-0x80);
   3636       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
   3637       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
   3638       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
   3639       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
   3640       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
   3641       __m128i xw = _mm_set1_epi16(255); // alpha channel
   3642 
   3643       for (; i+7 < count; i += 8) {
   3644          // load
   3645          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
   3646          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
   3647          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
   3648          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
   3649          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
   3650 
   3651          // unpack to short (and left-shift cr, cb by 8)
   3652          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
   3653          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
   3654          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
   3655 
   3656          // color transform
   3657          __m128i yws = _mm_srli_epi16(yw, 4);
   3658          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
   3659          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
   3660          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
   3661          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
   3662          __m128i rws = _mm_add_epi16(cr0, yws);
   3663          __m128i gwt = _mm_add_epi16(cb0, yws);
   3664          __m128i bws = _mm_add_epi16(yws, cb1);
   3665          __m128i gws = _mm_add_epi16(gwt, cr1);
   3666 
   3667          // descale
   3668          __m128i rw = _mm_srai_epi16(rws, 4);
   3669          __m128i bw = _mm_srai_epi16(bws, 4);
   3670          __m128i gw = _mm_srai_epi16(gws, 4);
   3671 
   3672          // back to byte, set up for transpose
   3673          __m128i brb = _mm_packus_epi16(rw, bw);
   3674          __m128i gxb = _mm_packus_epi16(gw, xw);
   3675 
   3676          // transpose to interleave channels
   3677          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
   3678          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
   3679          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
   3680          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
   3681 
   3682          // store
   3683          _mm_storeu_si128((__m128i *) (out + 0), o0);
   3684          _mm_storeu_si128((__m128i *) (out + 16), o1);
   3685          out += 32;
   3686       }
   3687    }
   3688 #endif
   3689 
   3690 #ifdef STBI_NEON
   3691    // in this version, step=3 support would be easy to add. but is there demand?
   3692    if (step == 4) {
   3693       // this is a fairly straightforward implementation and not super-optimized.
   3694       uint8x8_t signflip = vdup_n_u8(0x80);
   3695       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
   3696       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
   3697       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
   3698       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
   3699 
   3700       for (; i+7 < count; i += 8) {
   3701          // load
   3702          uint8x8_t y_bytes  = vld1_u8(y + i);
   3703          uint8x8_t cr_bytes = vld1_u8(pcr + i);
   3704          uint8x8_t cb_bytes = vld1_u8(pcb + i);
   3705          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
   3706          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
   3707 
   3708          // expand to s16
   3709          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
   3710          int16x8_t crw = vshll_n_s8(cr_biased, 7);
   3711          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
   3712 
   3713          // color transform
   3714          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
   3715          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
   3716          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
   3717          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
   3718          int16x8_t rws = vaddq_s16(yws, cr0);
   3719          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
   3720          int16x8_t bws = vaddq_s16(yws, cb1);
   3721 
   3722          // undo scaling, round, convert to byte
   3723          uint8x8x4_t o;
   3724          o.val[0] = vqrshrun_n_s16(rws, 4);
   3725          o.val[1] = vqrshrun_n_s16(gws, 4);
   3726          o.val[2] = vqrshrun_n_s16(bws, 4);
   3727          o.val[3] = vdup_n_u8(255);
   3728 
   3729          // store, interleaving r/g/b/a
   3730          vst4_u8(out, o);
   3731          out += 8*4;
   3732       }
   3733    }
   3734 #endif
   3735 
   3736    for (; i < count; ++i) {
   3737       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3738       int r,g,b;
   3739       int cr = pcr[i] - 128;
   3740       int cb = pcb[i] - 128;
   3741       r = y_fixed + cr* stbi__float2fixed(1.40200f);
   3742       g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3743       b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
   3744       r >>= 20;
   3745       g >>= 20;
   3746       b >>= 20;
   3747       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3748       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3749       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3750       out[0] = (stbi_uc)r;
   3751       out[1] = (stbi_uc)g;
   3752       out[2] = (stbi_uc)b;
   3753       out[3] = 255;
   3754       out += step;
   3755    }
   3756 }
   3757 #endif
   3758 
   3759 // set up the kernels
   3760 static void stbi__setup_jpeg(stbi__jpeg *j)
   3761 {
   3762    j->idct_block_kernel = stbi__idct_block;
   3763    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
   3764    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
   3765 
   3766 #ifdef STBI_SSE2
   3767    if (stbi__sse2_available()) {
   3768       j->idct_block_kernel = stbi__idct_simd;
   3769       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3770       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3771    }
   3772 #endif
   3773 
   3774 #ifdef STBI_NEON
   3775    j->idct_block_kernel = stbi__idct_simd;
   3776    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3777    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3778 #endif
   3779 }
   3780 
   3781 // clean up the temporary component buffers
   3782 static void stbi__cleanup_jpeg(stbi__jpeg *j)
   3783 {
   3784    stbi__free_jpeg_components(j, j->s->img_n, 0);
   3785 }
   3786 
   3787 typedef struct
   3788 {
   3789    resample_row_func resample;
   3790    stbi_uc *line0,*line1;
   3791    int hs,vs;   // expansion factor in each axis
   3792    int w_lores; // horizontal pixels pre-expansion
   3793    int ystep;   // how far through vertical expansion we are
   3794    int ypos;    // which pre-expansion row we're on
   3795 } stbi__resample;
   3796 
   3797 // fast 0..255 * 0..255 => 0..255 rounded multiplication
   3798 static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
   3799 {
   3800    unsigned int t = x*y + 128;
   3801    return (stbi_uc) ((t + (t >>8)) >> 8);
   3802 }
   3803 
   3804 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
   3805 {
   3806    int n, decode_n, is_rgb;
   3807    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
   3808 
   3809    // validate req_comp
   3810    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   3811 
   3812    // load a jpeg image from whichever source, but leave in YCbCr format
   3813    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
   3814 
   3815    // determine actual number of components to generate
   3816    n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
   3817 
   3818    is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
   3819 
   3820    if (z->s->img_n == 3 && n < 3 && !is_rgb)
   3821       decode_n = 1;
   3822    else
   3823       decode_n = z->s->img_n;
   3824 
   3825    // nothing to do if no components requested; check this now to avoid
   3826    // accessing uninitialized coutput[0] later
   3827    if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
   3828 
   3829    // resample and color-convert
   3830    {
   3831       int k;
   3832       unsigned int i,j;
   3833       stbi_uc *output;
   3834       stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
   3835 
   3836       stbi__resample res_comp[4];
   3837 
   3838       for (k=0; k < decode_n; ++k) {
   3839          stbi__resample *r = &res_comp[k];
   3840 
   3841          // allocate line buffer big enough for upsampling off the edges
   3842          // with upsample factor of 4
   3843          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
   3844          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3845 
   3846          r->hs      = z->img_h_max / z->img_comp[k].h;
   3847          r->vs      = z->img_v_max / z->img_comp[k].v;
   3848          r->ystep   = r->vs >> 1;
   3849          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
   3850          r->ypos    = 0;
   3851          r->line0   = r->line1 = z->img_comp[k].data;
   3852 
   3853          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
   3854          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
   3855          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
   3856          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
   3857          else                               r->resample = stbi__resample_row_generic;
   3858       }
   3859 
   3860       // can't error after this so, this is safe
   3861       output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
   3862       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3863 
   3864       // now go ahead and resample
   3865       for (j=0; j < z->s->img_y; ++j) {
   3866          stbi_uc *out = output + n * z->s->img_x * j;
   3867          for (k=0; k < decode_n; ++k) {
   3868             stbi__resample *r = &res_comp[k];
   3869             int y_bot = r->ystep >= (r->vs >> 1);
   3870             coutput[k] = r->resample(z->img_comp[k].linebuf,
   3871                                      y_bot ? r->line1 : r->line0,
   3872                                      y_bot ? r->line0 : r->line1,
   3873                                      r->w_lores, r->hs);
   3874             if (++r->ystep >= r->vs) {
   3875                r->ystep = 0;
   3876                r->line0 = r->line1;
   3877                if (++r->ypos < z->img_comp[k].y)
   3878                   r->line1 += z->img_comp[k].w2;
   3879             }
   3880          }
   3881          if (n >= 3) {
   3882             stbi_uc *y = coutput[0];
   3883             if (z->s->img_n == 3) {
   3884                if (is_rgb) {
   3885                   for (i=0; i < z->s->img_x; ++i) {
   3886                      out[0] = y[i];
   3887                      out[1] = coutput[1][i];
   3888                      out[2] = coutput[2][i];
   3889                      out[3] = 255;
   3890                      out += n;
   3891                   }
   3892                } else {
   3893                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3894                }
   3895             } else if (z->s->img_n == 4) {
   3896                if (z->app14_color_transform == 0) { // CMYK
   3897                   for (i=0; i < z->s->img_x; ++i) {
   3898                      stbi_uc m = coutput[3][i];
   3899                      out[0] = stbi__blinn_8x8(coutput[0][i], m);
   3900                      out[1] = stbi__blinn_8x8(coutput[1][i], m);
   3901                      out[2] = stbi__blinn_8x8(coutput[2][i], m);
   3902                      out[3] = 255;
   3903                      out += n;
   3904                   }
   3905                } else if (z->app14_color_transform == 2) { // YCCK
   3906                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3907                   for (i=0; i < z->s->img_x; ++i) {
   3908                      stbi_uc m = coutput[3][i];
   3909                      out[0] = stbi__blinn_8x8(255 - out[0], m);
   3910                      out[1] = stbi__blinn_8x8(255 - out[1], m);
   3911                      out[2] = stbi__blinn_8x8(255 - out[2], m);
   3912                      out += n;
   3913                   }
   3914                } else { // YCbCr + alpha?  Ignore the fourth channel for now
   3915                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3916                }
   3917             } else
   3918                for (i=0; i < z->s->img_x; ++i) {
   3919                   out[0] = out[1] = out[2] = y[i];
   3920                   out[3] = 255; // not used if n==3
   3921                   out += n;
   3922                }
   3923          } else {
   3924             if (is_rgb) {
   3925                if (n == 1)
   3926                   for (i=0; i < z->s->img_x; ++i)
   3927                      *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3928                else {
   3929                   for (i=0; i < z->s->img_x; ++i, out += 2) {
   3930                      out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3931                      out[1] = 255;
   3932                   }
   3933                }
   3934             } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
   3935                for (i=0; i < z->s->img_x; ++i) {
   3936                   stbi_uc m = coutput[3][i];
   3937                   stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
   3938                   stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
   3939                   stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
   3940                   out[0] = stbi__compute_y(r, g, b);
   3941                   out[1] = 255;
   3942                   out += n;
   3943                }
   3944             } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
   3945                for (i=0; i < z->s->img_x; ++i) {
   3946                   out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
   3947                   out[1] = 255;
   3948                   out += n;
   3949                }
   3950             } else {
   3951                stbi_uc *y = coutput[0];
   3952                if (n == 1)
   3953                   for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
   3954                else
   3955                   for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
   3956             }
   3957          }
   3958       }
   3959       stbi__cleanup_jpeg(z);
   3960       *out_x = z->s->img_x;
   3961       *out_y = z->s->img_y;
   3962       if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
   3963       return output;
   3964    }
   3965 }
   3966 
   3967 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   3968 {
   3969    unsigned char* result;
   3970    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
   3971    if (!j) return stbi__errpuc("outofmem", "Out of memory");
   3972    STBI_NOTUSED(ri);
   3973    j->s = s;
   3974    stbi__setup_jpeg(j);
   3975    result = load_jpeg_image(j, x,y,comp,req_comp);
   3976    STBI_FREE(j);
   3977    return result;
   3978 }
   3979 
   3980 static int stbi__jpeg_test(stbi__context *s)
   3981 {
   3982    int r;
   3983    stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
   3984    if (!j) return stbi__err("outofmem", "Out of memory");
   3985    j->s = s;
   3986    stbi__setup_jpeg(j);
   3987    r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
   3988    stbi__rewind(s);
   3989    STBI_FREE(j);
   3990    return r;
   3991 }
   3992 
   3993 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
   3994 {
   3995    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
   3996       stbi__rewind( j->s );
   3997       return 0;
   3998    }
   3999    if (x) *x = j->s->img_x;
   4000    if (y) *y = j->s->img_y;
   4001    if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
   4002    return 1;
   4003 }
   4004 
   4005 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
   4006 {
   4007    int result;
   4008    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
   4009    if (!j) return stbi__err("outofmem", "Out of memory");
   4010    j->s = s;
   4011    result = stbi__jpeg_info_raw(j, x, y, comp);
   4012    STBI_FREE(j);
   4013    return result;
   4014 }
   4015 #endif
   4016 
   4017 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
   4018 //    simple implementation
   4019 //      - all input must be provided in an upfront buffer
   4020 //      - all output is written to a single output buffer (can malloc/realloc)
   4021 //    performance
   4022 //      - fast huffman
   4023 
   4024 #ifndef STBI_NO_ZLIB
   4025 
   4026 // fast-way is faster to check than jpeg huffman, but slow way is slower
   4027 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
   4028 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
   4029 #define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
   4030 
   4031 // zlib-style huffman encoding
   4032 // (jpegs packs from left, zlib from right, so can't share code)
   4033 typedef struct
   4034 {
   4035    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
   4036    stbi__uint16 firstcode[16];
   4037    int maxcode[17];
   4038    stbi__uint16 firstsymbol[16];
   4039    stbi_uc  size[STBI__ZNSYMS];
   4040    stbi__uint16 value[STBI__ZNSYMS];
   4041 } stbi__zhuffman;
   4042 
   4043 stbi_inline static int stbi__bitreverse16(int n)
   4044 {
   4045   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
   4046   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
   4047   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
   4048   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
   4049   return n;
   4050 }
   4051 
   4052 stbi_inline static int stbi__bit_reverse(int v, int bits)
   4053 {
   4054    STBI_ASSERT(bits <= 16);
   4055    // to bit reverse n bits, reverse 16 and shift
   4056    // e.g. 11 bits, bit reverse and shift away 5
   4057    return stbi__bitreverse16(v) >> (16-bits);
   4058 }
   4059 
   4060 static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
   4061 {
   4062    int i,k=0;
   4063    int code, next_code[16], sizes[17];
   4064 
   4065    // DEFLATE spec for generating codes
   4066    memset(sizes, 0, sizeof(sizes));
   4067    memset(z->fast, 0, sizeof(z->fast));
   4068    for (i=0; i < num; ++i)
   4069       ++sizes[sizelist[i]];
   4070    sizes[0] = 0;
   4071    for (i=1; i < 16; ++i)
   4072       if (sizes[i] > (1 << i))
   4073          return stbi__err("bad sizes", "Corrupt PNG");
   4074    code = 0;
   4075    for (i=1; i < 16; ++i) {
   4076       next_code[i] = code;
   4077       z->firstcode[i] = (stbi__uint16) code;
   4078       z->firstsymbol[i] = (stbi__uint16) k;
   4079       code = (code + sizes[i]);
   4080       if (sizes[i])
   4081          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
   4082       z->maxcode[i] = code << (16-i); // preshift for inner loop
   4083       code <<= 1;
   4084       k += sizes[i];
   4085    }
   4086    z->maxcode[16] = 0x10000; // sentinel
   4087    for (i=0; i < num; ++i) {
   4088       int s = sizelist[i];
   4089       if (s) {
   4090          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
   4091          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
   4092          z->size [c] = (stbi_uc     ) s;
   4093          z->value[c] = (stbi__uint16) i;
   4094          if (s <= STBI__ZFAST_BITS) {
   4095             int j = stbi__bit_reverse(next_code[s],s);
   4096             while (j < (1 << STBI__ZFAST_BITS)) {
   4097                z->fast[j] = fastv;
   4098                j += (1 << s);
   4099             }
   4100          }
   4101          ++next_code[s];
   4102       }
   4103    }
   4104    return 1;
   4105 }
   4106 
   4107 // zlib-from-memory implementation for PNG reading
   4108 //    because PNG allows splitting the zlib stream arbitrarily,
   4109 //    and it's annoying structurally to have PNG call ZLIB call PNG,
   4110 //    we require PNG read all the IDATs and combine them into a single
   4111 //    memory buffer
   4112 
   4113 typedef struct
   4114 {
   4115    stbi_uc *zbuffer, *zbuffer_end;
   4116    int num_bits;
   4117    stbi__uint32 code_buffer;
   4118 
   4119    char *zout;
   4120    char *zout_start;
   4121    char *zout_end;
   4122    int   z_expandable;
   4123 
   4124    stbi__zhuffman z_length, z_distance;
   4125 } stbi__zbuf;
   4126 
   4127 stbi_inline static int stbi__zeof(stbi__zbuf *z)
   4128 {
   4129    return (z->zbuffer >= z->zbuffer_end);
   4130 }
   4131 
   4132 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
   4133 {
   4134    return stbi__zeof(z) ? 0 : *z->zbuffer++;
   4135 }
   4136 
   4137 static void stbi__fill_bits(stbi__zbuf *z)
   4138 {
   4139    do {
   4140       if (z->code_buffer >= (1U << z->num_bits)) {
   4141         z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
   4142         return;
   4143       }
   4144       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
   4145       z->num_bits += 8;
   4146    } while (z->num_bits <= 24);
   4147 }
   4148 
   4149 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
   4150 {
   4151    unsigned int k;
   4152    if (z->num_bits < n) stbi__fill_bits(z);
   4153    k = z->code_buffer & ((1 << n) - 1);
   4154    z->code_buffer >>= n;
   4155    z->num_bits -= n;
   4156    return k;
   4157 }
   4158 
   4159 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
   4160 {
   4161    int b,s,k;
   4162    // not resolved by fast table, so compute it the slow way
   4163    // use jpeg approach, which requires MSbits at top
   4164    k = stbi__bit_reverse(a->code_buffer, 16);
   4165    for (s=STBI__ZFAST_BITS+1; ; ++s)
   4166       if (k < z->maxcode[s])
   4167          break;
   4168    if (s >= 16) return -1; // invalid code!
   4169    // code size is s, so:
   4170    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
   4171    if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
   4172    if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
   4173    a->code_buffer >>= s;
   4174    a->num_bits -= s;
   4175    return z->value[b];
   4176 }
   4177 
   4178 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
   4179 {
   4180    int b,s;
   4181    if (a->num_bits < 16) {
   4182       if (stbi__zeof(a)) {
   4183          return -1;   /* report error for unexpected end of data. */
   4184       }
   4185       stbi__fill_bits(a);
   4186    }
   4187    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
   4188    if (b) {
   4189       s = b >> 9;
   4190       a->code_buffer >>= s;
   4191       a->num_bits -= s;
   4192       return b & 511;
   4193    }
   4194    return stbi__zhuffman_decode_slowpath(a, z);
   4195 }
   4196 
   4197 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
   4198 {
   4199    char *q;
   4200    unsigned int cur, limit, old_limit;
   4201    z->zout = zout;
   4202    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
   4203    cur   = (unsigned int) (z->zout - z->zout_start);
   4204    limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
   4205    if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
   4206    while (cur + n > limit) {
   4207       if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
   4208       limit *= 2;
   4209    }
   4210    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
   4211    STBI_NOTUSED(old_limit);
   4212    if (q == NULL) return stbi__err("outofmem", "Out of memory");
   4213    z->zout_start = q;
   4214    z->zout       = q + cur;
   4215    z->zout_end   = q + limit;
   4216    return 1;
   4217 }
   4218 
   4219 static const int stbi__zlength_base[31] = {
   4220    3,4,5,6,7,8,9,10,11,13,
   4221    15,17,19,23,27,31,35,43,51,59,
   4222    67,83,99,115,131,163,195,227,258,0,0 };
   4223 
   4224 static const int stbi__zlength_extra[31]=
   4225 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
   4226 
   4227 static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
   4228 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
   4229 
   4230 static const int stbi__zdist_extra[32] =
   4231 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
   4232 
   4233 static int stbi__parse_huffman_block(stbi__zbuf *a)
   4234 {
   4235    char *zout = a->zout;
   4236    for(;;) {
   4237       int z = stbi__zhuffman_decode(a, &a->z_length);
   4238       if (z < 256) {
   4239          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
   4240          if (zout >= a->zout_end) {
   4241             if (!stbi__zexpand(a, zout, 1)) return 0;
   4242             zout = a->zout;
   4243          }
   4244          *zout++ = (char) z;
   4245       } else {
   4246          stbi_uc *p;
   4247          int len,dist;
   4248          if (z == 256) {
   4249             a->zout = zout;
   4250             return 1;
   4251          }
   4252          z -= 257;
   4253          len = stbi__zlength_base[z];
   4254          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
   4255          z = stbi__zhuffman_decode(a, &a->z_distance);
   4256          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
   4257          dist = stbi__zdist_base[z];
   4258          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
   4259          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
   4260          if (zout + len > a->zout_end) {
   4261             if (!stbi__zexpand(a, zout, len)) return 0;
   4262             zout = a->zout;
   4263          }
   4264          p = (stbi_uc *) (zout - dist);
   4265          if (dist == 1) { // run of one byte; common in images.
   4266             stbi_uc v = *p;
   4267             if (len) { do *zout++ = v; while (--len); }
   4268          } else {
   4269             if (len) { do *zout++ = *p++; while (--len); }
   4270          }
   4271       }
   4272    }
   4273 }
   4274 
   4275 static int stbi__compute_huffman_codes(stbi__zbuf *a)
   4276 {
   4277    static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
   4278    stbi__zhuffman z_codelength;
   4279    stbi_uc lencodes[286+32+137];//padding for maximum single op
   4280    stbi_uc codelength_sizes[19];
   4281    int i,n;
   4282 
   4283    int hlit  = stbi__zreceive(a,5) + 257;
   4284    int hdist = stbi__zreceive(a,5) + 1;
   4285    int hclen = stbi__zreceive(a,4) + 4;
   4286    int ntot  = hlit + hdist;
   4287 
   4288    memset(codelength_sizes, 0, sizeof(codelength_sizes));
   4289    for (i=0; i < hclen; ++i) {
   4290       int s = stbi__zreceive(a,3);
   4291       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
   4292    }
   4293    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
   4294 
   4295    n = 0;
   4296    while (n < ntot) {
   4297       int c = stbi__zhuffman_decode(a, &z_codelength);
   4298       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
   4299       if (c < 16)
   4300          lencodes[n++] = (stbi_uc) c;
   4301       else {
   4302          stbi_uc fill = 0;
   4303          if (c == 16) {
   4304             c = stbi__zreceive(a,2)+3;
   4305             if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
   4306             fill = lencodes[n-1];
   4307          } else if (c == 17) {
   4308             c = stbi__zreceive(a,3)+3;
   4309          } else if (c == 18) {
   4310             c = stbi__zreceive(a,7)+11;
   4311          } else {
   4312             return stbi__err("bad codelengths", "Corrupt PNG");
   4313          }
   4314          if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
   4315          memset(lencodes+n, fill, c);
   4316          n += c;
   4317       }
   4318    }
   4319    if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
   4320    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
   4321    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
   4322    return 1;
   4323 }
   4324 
   4325 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
   4326 {
   4327    stbi_uc header[4];
   4328    int len,nlen,k;
   4329    if (a->num_bits & 7)
   4330       stbi__zreceive(a, a->num_bits & 7); // discard
   4331    // drain the bit-packed data into header
   4332    k = 0;
   4333    while (a->num_bits > 0) {
   4334       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
   4335       a->code_buffer >>= 8;
   4336       a->num_bits -= 8;
   4337    }
   4338    if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
   4339    // now fill header the normal way
   4340    while (k < 4)
   4341       header[k++] = stbi__zget8(a);
   4342    len  = header[1] * 256 + header[0];
   4343    nlen = header[3] * 256 + header[2];
   4344    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
   4345    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
   4346    if (a->zout + len > a->zout_end)
   4347       if (!stbi__zexpand(a, a->zout, len)) return 0;
   4348    memcpy(a->zout, a->zbuffer, len);
   4349    a->zbuffer += len;
   4350    a->zout += len;
   4351    return 1;
   4352 }
   4353 
   4354 static int stbi__parse_zlib_header(stbi__zbuf *a)
   4355 {
   4356    int cmf   = stbi__zget8(a);
   4357    int cm    = cmf & 15;
   4358    /* int cinfo = cmf >> 4; */
   4359    int flg   = stbi__zget8(a);
   4360    if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   4361    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   4362    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
   4363    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
   4364    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
   4365    return 1;
   4366 }
   4367 
   4368 static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
   4369 {
   4370    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4371    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4372    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4373    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4374    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4375    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4376    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4377    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4378    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
   4379 };
   4380 static const stbi_uc stbi__zdefault_distance[32] =
   4381 {
   4382    5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
   4383 };
   4384 /*
   4385 Init algorithm:
   4386 {
   4387    int i;   // use <= to match clearly with spec
   4388    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
   4389    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
   4390    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
   4391    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
   4392 
   4393    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
   4394 }
   4395 */
   4396 
   4397 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
   4398 {
   4399    int final, type;
   4400    if (parse_header)
   4401       if (!stbi__parse_zlib_header(a)) return 0;
   4402    a->num_bits = 0;
   4403    a->code_buffer = 0;
   4404    do {
   4405       final = stbi__zreceive(a,1);
   4406       type = stbi__zreceive(a,2);
   4407       if (type == 0) {
   4408          if (!stbi__parse_uncompressed_block(a)) return 0;
   4409       } else if (type == 3) {
   4410          return 0;
   4411       } else {
   4412          if (type == 1) {
   4413             // use fixed code lengths
   4414             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
   4415             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
   4416          } else {
   4417             if (!stbi__compute_huffman_codes(a)) return 0;
   4418          }
   4419          if (!stbi__parse_huffman_block(a)) return 0;
   4420       }
   4421    } while (!final);
   4422    return 1;
   4423 }
   4424 
   4425 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
   4426 {
   4427    a->zout_start = obuf;
   4428    a->zout       = obuf;
   4429    a->zout_end   = obuf + olen;
   4430    a->z_expandable = exp;
   4431 
   4432    return stbi__parse_zlib(a, parse_header);
   4433 }
   4434 
   4435 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
   4436 {
   4437    stbi__zbuf a;
   4438    char *p = (char *) stbi__malloc(initial_size);
   4439    if (p == NULL) return NULL;
   4440    a.zbuffer = (stbi_uc *) buffer;
   4441    a.zbuffer_end = (stbi_uc *) buffer + len;
   4442    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
   4443       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4444       return a.zout_start;
   4445    } else {
   4446       STBI_FREE(a.zout_start);
   4447       return NULL;
   4448    }
   4449 }
   4450 
   4451 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
   4452 {
   4453    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
   4454 }
   4455 
   4456 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
   4457 {
   4458    stbi__zbuf a;
   4459    char *p = (char *) stbi__malloc(initial_size);
   4460    if (p == NULL) return NULL;
   4461    a.zbuffer = (stbi_uc *) buffer;
   4462    a.zbuffer_end = (stbi_uc *) buffer + len;
   4463    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
   4464       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4465       return a.zout_start;
   4466    } else {
   4467       STBI_FREE(a.zout_start);
   4468       return NULL;
   4469    }
   4470 }
   4471 
   4472 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
   4473 {
   4474    stbi__zbuf a;
   4475    a.zbuffer = (stbi_uc *) ibuffer;
   4476    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4477    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
   4478       return (int) (a.zout - a.zout_start);
   4479    else
   4480       return -1;
   4481 }
   4482 
   4483 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
   4484 {
   4485    stbi__zbuf a;
   4486    char *p = (char *) stbi__malloc(16384);
   4487    if (p == NULL) return NULL;
   4488    a.zbuffer = (stbi_uc *) buffer;
   4489    a.zbuffer_end = (stbi_uc *) buffer+len;
   4490    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
   4491       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4492       return a.zout_start;
   4493    } else {
   4494       STBI_FREE(a.zout_start);
   4495       return NULL;
   4496    }
   4497 }
   4498 
   4499 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
   4500 {
   4501    stbi__zbuf a;
   4502    a.zbuffer = (stbi_uc *) ibuffer;
   4503    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4504    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
   4505       return (int) (a.zout - a.zout_start);
   4506    else
   4507       return -1;
   4508 }
   4509 #endif
   4510 
   4511 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
   4512 //    simple implementation
   4513 //      - only 8-bit samples
   4514 //      - no CRC checking
   4515 //      - allocates lots of intermediate memory
   4516 //        - avoids problem of streaming data between subsystems
   4517 //        - avoids explicit window management
   4518 //    performance
   4519 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
   4520 
   4521 #ifndef STBI_NO_PNG
   4522 typedef struct
   4523 {
   4524    stbi__uint32 length;
   4525    stbi__uint32 type;
   4526 } stbi__pngchunk;
   4527 
   4528 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
   4529 {
   4530    stbi__pngchunk c;
   4531    c.length = stbi__get32be(s);
   4532    c.type   = stbi__get32be(s);
   4533    return c;
   4534 }
   4535 
   4536 static int stbi__check_png_header(stbi__context *s)
   4537 {
   4538    static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
   4539    int i;
   4540    for (i=0; i < 8; ++i)
   4541       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
   4542    return 1;
   4543 }
   4544 
   4545 typedef struct
   4546 {
   4547    stbi__context *s;
   4548    stbi_uc *idata, *expanded, *out;
   4549    int depth;
   4550 } stbi__png;
   4551 
   4552 
   4553 enum {
   4554    STBI__F_none=0,
   4555    STBI__F_sub=1,
   4556    STBI__F_up=2,
   4557    STBI__F_avg=3,
   4558    STBI__F_paeth=4,
   4559    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
   4560    STBI__F_avg_first,
   4561    STBI__F_paeth_first
   4562 };
   4563 
   4564 static stbi_uc first_row_filter[5] =
   4565 {
   4566    STBI__F_none,
   4567    STBI__F_sub,
   4568    STBI__F_none,
   4569    STBI__F_avg_first,
   4570    STBI__F_paeth_first
   4571 };
   4572 
   4573 static int stbi__paeth(int a, int b, int c)
   4574 {
   4575    int p = a + b - c;
   4576    int pa = abs(p-a);
   4577    int pb = abs(p-b);
   4578    int pc = abs(p-c);
   4579    if (pa <= pb && pa <= pc) return a;
   4580    if (pb <= pc) return b;
   4581    return c;
   4582 }
   4583 
   4584 static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
   4585 
   4586 // create the png data from post-deflated data
   4587 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
   4588 {
   4589    int bytes = (depth == 16? 2 : 1);
   4590    stbi__context *s = a->s;
   4591    stbi__uint32 i,j,stride = x*out_n*bytes;
   4592    stbi__uint32 img_len, img_width_bytes;
   4593    int k;
   4594    int img_n = s->img_n; // copy it into a local for later
   4595 
   4596    int output_bytes = out_n*bytes;
   4597    int filter_bytes = img_n*bytes;
   4598    int width = x;
   4599 
   4600    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
   4601    a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
   4602    if (!a->out) return stbi__err("outofmem", "Out of memory");
   4603 
   4604    if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
   4605    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
   4606    img_len = (img_width_bytes + 1) * y;
   4607 
   4608    // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
   4609    // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
   4610    // so just check for raw_len < img_len always.
   4611    if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
   4612 
   4613    for (j=0; j < y; ++j) {
   4614       stbi_uc *cur = a->out + stride*j;
   4615       stbi_uc *prior;
   4616       int filter = *raw++;
   4617 
   4618       if (filter > 4)
   4619          return stbi__err("invalid filter","Corrupt PNG");
   4620 
   4621       if (depth < 8) {
   4622          if (img_width_bytes > x) return stbi__err("invalid width","Corrupt PNG");
   4623          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
   4624          filter_bytes = 1;
   4625          width = img_width_bytes;
   4626       }
   4627       prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
   4628 
   4629       // if first row, use special filter that doesn't sample previous row
   4630       if (j == 0) filter = first_row_filter[filter];
   4631 
   4632       // handle first byte explicitly
   4633       for (k=0; k < filter_bytes; ++k) {
   4634          switch (filter) {
   4635             case STBI__F_none       : cur[k] = raw[k]; break;
   4636             case STBI__F_sub        : cur[k] = raw[k]; break;
   4637             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4638             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
   4639             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
   4640             case STBI__F_avg_first  : cur[k] = raw[k]; break;
   4641             case STBI__F_paeth_first: cur[k] = raw[k]; break;
   4642          }
   4643       }
   4644 
   4645       if (depth == 8) {
   4646          if (img_n != out_n)
   4647             cur[img_n] = 255; // first pixel
   4648          raw += img_n;
   4649          cur += out_n;
   4650          prior += out_n;
   4651       } else if (depth == 16) {
   4652          if (img_n != out_n) {
   4653             cur[filter_bytes]   = 255; // first pixel top byte
   4654             cur[filter_bytes+1] = 255; // first pixel bottom byte
   4655          }
   4656          raw += filter_bytes;
   4657          cur += output_bytes;
   4658          prior += output_bytes;
   4659       } else {
   4660          raw += 1;
   4661          cur += 1;
   4662          prior += 1;
   4663       }
   4664 
   4665       // this is a little gross, so that we don't switch per-pixel or per-component
   4666       if (depth < 8 || img_n == out_n) {
   4667          int nk = (width - 1)*filter_bytes;
   4668          #define STBI__CASE(f) \
   4669              case f:     \
   4670                 for (k=0; k < nk; ++k)
   4671          switch (filter) {
   4672             // "none" filter turns into a memcpy here; make that explicit.
   4673             case STBI__F_none:         memcpy(cur, raw, nk); break;
   4674             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
   4675             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4676             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
   4677             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
   4678             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
   4679             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
   4680          }
   4681          #undef STBI__CASE
   4682          raw += nk;
   4683       } else {
   4684          STBI_ASSERT(img_n+1 == out_n);
   4685          #define STBI__CASE(f) \
   4686              case f:     \
   4687                 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
   4688                    for (k=0; k < filter_bytes; ++k)
   4689          switch (filter) {
   4690             STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
   4691             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
   4692             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4693             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
   4694             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
   4695             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
   4696             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
   4697          }
   4698          #undef STBI__CASE
   4699 
   4700          // the loop above sets the high byte of the pixels' alpha, but for
   4701          // 16 bit png files we also need the low byte set. we'll do that here.
   4702          if (depth == 16) {
   4703             cur = a->out + stride*j; // start at the beginning of the row again
   4704             for (i=0; i < x; ++i,cur+=output_bytes) {
   4705                cur[filter_bytes+1] = 255;
   4706             }
   4707          }
   4708       }
   4709    }
   4710 
   4711    // we make a separate pass to expand bits to pixels; for performance,
   4712    // this could run two scanlines behind the above code, so it won't
   4713    // intefere with filtering but will still be in the cache.
   4714    if (depth < 8) {
   4715       for (j=0; j < y; ++j) {
   4716          stbi_uc *cur = a->out + stride*j;
   4717          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
   4718          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
   4719          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
   4720          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
   4721 
   4722          // note that the final byte might overshoot and write more data than desired.
   4723          // we can allocate enough data that this never writes out of memory, but it
   4724          // could also overwrite the next scanline. can it overwrite non-empty data
   4725          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
   4726          // so we need to explicitly clamp the final ones
   4727 
   4728          if (depth == 4) {
   4729             for (k=x*img_n; k >= 2; k-=2, ++in) {
   4730                *cur++ = scale * ((*in >> 4)       );
   4731                *cur++ = scale * ((*in     ) & 0x0f);
   4732             }
   4733             if (k > 0) *cur++ = scale * ((*in >> 4)       );
   4734          } else if (depth == 2) {
   4735             for (k=x*img_n; k >= 4; k-=4, ++in) {
   4736                *cur++ = scale * ((*in >> 6)       );
   4737                *cur++ = scale * ((*in >> 4) & 0x03);
   4738                *cur++ = scale * ((*in >> 2) & 0x03);
   4739                *cur++ = scale * ((*in     ) & 0x03);
   4740             }
   4741             if (k > 0) *cur++ = scale * ((*in >> 6)       );
   4742             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
   4743             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
   4744          } else if (depth == 1) {
   4745             for (k=x*img_n; k >= 8; k-=8, ++in) {
   4746                *cur++ = scale * ((*in >> 7)       );
   4747                *cur++ = scale * ((*in >> 6) & 0x01);
   4748                *cur++ = scale * ((*in >> 5) & 0x01);
   4749                *cur++ = scale * ((*in >> 4) & 0x01);
   4750                *cur++ = scale * ((*in >> 3) & 0x01);
   4751                *cur++ = scale * ((*in >> 2) & 0x01);
   4752                *cur++ = scale * ((*in >> 1) & 0x01);
   4753                *cur++ = scale * ((*in     ) & 0x01);
   4754             }
   4755             if (k > 0) *cur++ = scale * ((*in >> 7)       );
   4756             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
   4757             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
   4758             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
   4759             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
   4760             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
   4761             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
   4762          }
   4763          if (img_n != out_n) {
   4764             int q;
   4765             // insert alpha = 255
   4766             cur = a->out + stride*j;
   4767             if (img_n == 1) {
   4768                for (q=x-1; q >= 0; --q) {
   4769                   cur[q*2+1] = 255;
   4770                   cur[q*2+0] = cur[q];
   4771                }
   4772             } else {
   4773                STBI_ASSERT(img_n == 3);
   4774                for (q=x-1; q >= 0; --q) {
   4775                   cur[q*4+3] = 255;
   4776                   cur[q*4+2] = cur[q*3+2];
   4777                   cur[q*4+1] = cur[q*3+1];
   4778                   cur[q*4+0] = cur[q*3+0];
   4779                }
   4780             }
   4781          }
   4782       }
   4783    } else if (depth == 16) {
   4784       // force the image data from big-endian to platform-native.
   4785       // this is done in a separate pass due to the decoding relying
   4786       // on the data being untouched, but could probably be done
   4787       // per-line during decode if care is taken.
   4788       stbi_uc *cur = a->out;
   4789       stbi__uint16 *cur16 = (stbi__uint16*)cur;
   4790 
   4791       for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
   4792          *cur16 = (cur[0] << 8) | cur[1];
   4793       }
   4794    }
   4795 
   4796    return 1;
   4797 }
   4798 
   4799 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
   4800 {
   4801    int bytes = (depth == 16 ? 2 : 1);
   4802    int out_bytes = out_n * bytes;
   4803    stbi_uc *final;
   4804    int p;
   4805    if (!interlaced)
   4806       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
   4807 
   4808    // de-interlacing
   4809    final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
   4810    if (!final) return stbi__err("outofmem", "Out of memory");
   4811    for (p=0; p < 7; ++p) {
   4812       int xorig[] = { 0,4,0,2,0,1,0 };
   4813       int yorig[] = { 0,0,4,0,2,0,1 };
   4814       int xspc[]  = { 8,8,4,4,2,2,1 };
   4815       int yspc[]  = { 8,8,8,4,4,2,2 };
   4816       int i,j,x,y;
   4817       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
   4818       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
   4819       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
   4820       if (x && y) {
   4821          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
   4822          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
   4823             STBI_FREE(final);
   4824             return 0;
   4825          }
   4826          for (j=0; j < y; ++j) {
   4827             for (i=0; i < x; ++i) {
   4828                int out_y = j*yspc[p]+yorig[p];
   4829                int out_x = i*xspc[p]+xorig[p];
   4830                memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
   4831                       a->out + (j*x+i)*out_bytes, out_bytes);
   4832             }
   4833          }
   4834          STBI_FREE(a->out);
   4835          image_data += img_len;
   4836          image_data_len -= img_len;
   4837       }
   4838    }
   4839    a->out = final;
   4840 
   4841    return 1;
   4842 }
   4843 
   4844 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
   4845 {
   4846    stbi__context *s = z->s;
   4847    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4848    stbi_uc *p = z->out;
   4849 
   4850    // compute color-based transparency, assuming we've
   4851    // already got 255 as the alpha value in the output
   4852    STBI_ASSERT(out_n == 2 || out_n == 4);
   4853 
   4854    if (out_n == 2) {
   4855       for (i=0; i < pixel_count; ++i) {
   4856          p[1] = (p[0] == tc[0] ? 0 : 255);
   4857          p += 2;
   4858       }
   4859    } else {
   4860       for (i=0; i < pixel_count; ++i) {
   4861          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4862             p[3] = 0;
   4863          p += 4;
   4864       }
   4865    }
   4866    return 1;
   4867 }
   4868 
   4869 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
   4870 {
   4871    stbi__context *s = z->s;
   4872    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4873    stbi__uint16 *p = (stbi__uint16*) z->out;
   4874 
   4875    // compute color-based transparency, assuming we've
   4876    // already got 65535 as the alpha value in the output
   4877    STBI_ASSERT(out_n == 2 || out_n == 4);
   4878 
   4879    if (out_n == 2) {
   4880       for (i = 0; i < pixel_count; ++i) {
   4881          p[1] = (p[0] == tc[0] ? 0 : 65535);
   4882          p += 2;
   4883       }
   4884    } else {
   4885       for (i = 0; i < pixel_count; ++i) {
   4886          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4887             p[3] = 0;
   4888          p += 4;
   4889       }
   4890    }
   4891    return 1;
   4892 }
   4893 
   4894 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
   4895 {
   4896    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
   4897    stbi_uc *p, *temp_out, *orig = a->out;
   4898 
   4899    p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
   4900    if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4901 
   4902    // between here and free(out) below, exitting would leak
   4903    temp_out = p;
   4904 
   4905    if (pal_img_n == 3) {
   4906       for (i=0; i < pixel_count; ++i) {
   4907          int n = orig[i]*4;
   4908          p[0] = palette[n  ];
   4909          p[1] = palette[n+1];
   4910          p[2] = palette[n+2];
   4911          p += 3;
   4912       }
   4913    } else {
   4914       for (i=0; i < pixel_count; ++i) {
   4915          int n = orig[i]*4;
   4916          p[0] = palette[n  ];
   4917          p[1] = palette[n+1];
   4918          p[2] = palette[n+2];
   4919          p[3] = palette[n+3];
   4920          p += 4;
   4921       }
   4922    }
   4923    STBI_FREE(a->out);
   4924    a->out = temp_out;
   4925 
   4926    STBI_NOTUSED(len);
   4927 
   4928    return 1;
   4929 }
   4930 
   4931 static int stbi__unpremultiply_on_load_global = 0;
   4932 static int stbi__de_iphone_flag_global = 0;
   4933 
   4934 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
   4935 {
   4936    stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
   4937 }
   4938 
   4939 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
   4940 {
   4941    stbi__de_iphone_flag_global = flag_true_if_should_convert;
   4942 }
   4943 
   4944 #ifndef STBI_THREAD_LOCAL
   4945 #define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
   4946 #define stbi__de_iphone_flag  stbi__de_iphone_flag_global
   4947 #else
   4948 static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
   4949 static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
   4950 
   4951 STBIDEF void stbi__unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
   4952 {
   4953    stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
   4954    stbi__unpremultiply_on_load_set = 1;
   4955 }
   4956 
   4957 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
   4958 {
   4959    stbi__de_iphone_flag_local = flag_true_if_should_convert;
   4960    stbi__de_iphone_flag_set = 1;
   4961 }
   4962 
   4963 #define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
   4964                                        ? stbi__unpremultiply_on_load_local      \
   4965                                        : stbi__unpremultiply_on_load_global)
   4966 #define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
   4967                                 ? stbi__de_iphone_flag_local                    \
   4968                                 : stbi__de_iphone_flag_global)
   4969 #endif // STBI_THREAD_LOCAL
   4970 
   4971 static void stbi__de_iphone(stbi__png *z)
   4972 {
   4973    stbi__context *s = z->s;
   4974    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4975    stbi_uc *p = z->out;
   4976 
   4977    if (s->img_out_n == 3) {  // convert bgr to rgb
   4978       for (i=0; i < pixel_count; ++i) {
   4979          stbi_uc t = p[0];
   4980          p[0] = p[2];
   4981          p[2] = t;
   4982          p += 3;
   4983       }
   4984    } else {
   4985       STBI_ASSERT(s->img_out_n == 4);
   4986       if (stbi__unpremultiply_on_load) {
   4987          // convert bgr to rgb and unpremultiply
   4988          for (i=0; i < pixel_count; ++i) {
   4989             stbi_uc a = p[3];
   4990             stbi_uc t = p[0];
   4991             if (a) {
   4992                stbi_uc half = a / 2;
   4993                p[0] = (p[2] * 255 + half) / a;
   4994                p[1] = (p[1] * 255 + half) / a;
   4995                p[2] = ( t   * 255 + half) / a;
   4996             } else {
   4997                p[0] = p[2];
   4998                p[2] = t;
   4999             }
   5000             p += 4;
   5001          }
   5002       } else {
   5003          // convert bgr to rgb
   5004          for (i=0; i < pixel_count; ++i) {
   5005             stbi_uc t = p[0];
   5006             p[0] = p[2];
   5007             p[2] = t;
   5008             p += 4;
   5009          }
   5010       }
   5011    }
   5012 }
   5013 
   5014 #define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
   5015 
   5016 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
   5017 {
   5018    stbi_uc palette[1024], pal_img_n=0;
   5019    stbi_uc has_trans=0, tc[3]={0};
   5020    stbi__uint16 tc16[3];
   5021    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
   5022    int first=1,k,interlace=0, color=0, is_iphone=0;
   5023    stbi__context *s = z->s;
   5024 
   5025    z->expanded = NULL;
   5026    z->idata = NULL;
   5027    z->out = NULL;
   5028 
   5029    if (!stbi__check_png_header(s)) return 0;
   5030 
   5031    if (scan == STBI__SCAN_type) return 1;
   5032 
   5033    for (;;) {
   5034       stbi__pngchunk c = stbi__get_chunk_header(s);
   5035       switch (c.type) {
   5036          case STBI__PNG_TYPE('C','g','B','I'):
   5037             is_iphone = 1;
   5038             stbi__skip(s, c.length);
   5039             break;
   5040          case STBI__PNG_TYPE('I','H','D','R'): {
   5041             int comp,filter;
   5042             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
   5043             first = 0;
   5044             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
   5045             s->img_x = stbi__get32be(s);
   5046             s->img_y = stbi__get32be(s);
   5047             if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   5048             if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   5049             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
   5050             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
   5051             if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
   5052             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
   5053             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
   5054             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
   5055             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
   5056             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
   5057             if (!pal_img_n) {
   5058                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
   5059                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   5060                if (scan == STBI__SCAN_header) return 1;
   5061             } else {
   5062                // if paletted, then pal_n is our final components, and
   5063                // img_n is # components to decompress/filter.
   5064                s->img_n = 1;
   5065                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
   5066                // if SCAN_header, have to scan to see if we have a tRNS
   5067             }
   5068             break;
   5069          }
   5070 
   5071          case STBI__PNG_TYPE('P','L','T','E'):  {
   5072             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5073             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
   5074             pal_len = c.length / 3;
   5075             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
   5076             for (i=0; i < pal_len; ++i) {
   5077                palette[i*4+0] = stbi__get8(s);
   5078                palette[i*4+1] = stbi__get8(s);
   5079                palette[i*4+2] = stbi__get8(s);
   5080                palette[i*4+3] = 255;
   5081             }
   5082             break;
   5083          }
   5084 
   5085          case STBI__PNG_TYPE('t','R','N','S'): {
   5086             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5087             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
   5088             if (pal_img_n) {
   5089                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
   5090                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
   5091                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
   5092                pal_img_n = 4;
   5093                for (i=0; i < c.length; ++i)
   5094                   palette[i*4+3] = stbi__get8(s);
   5095             } else {
   5096                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
   5097                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
   5098                has_trans = 1;
   5099                if (z->depth == 16) {
   5100                   for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
   5101                } else {
   5102                   for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
   5103                }
   5104             }
   5105             break;
   5106          }
   5107 
   5108          case STBI__PNG_TYPE('I','D','A','T'): {
   5109             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5110             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
   5111             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
   5112             if ((int)(ioff + c.length) < (int)ioff) return 0;
   5113             if (ioff + c.length > idata_limit) {
   5114                stbi__uint32 idata_limit_old = idata_limit;
   5115                stbi_uc *p;
   5116                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
   5117                while (ioff + c.length > idata_limit)
   5118                   idata_limit *= 2;
   5119                STBI_NOTUSED(idata_limit_old);
   5120                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
   5121                z->idata = p;
   5122             }
   5123             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
   5124             ioff += c.length;
   5125             break;
   5126          }
   5127 
   5128          case STBI__PNG_TYPE('I','E','N','D'): {
   5129             stbi__uint32 raw_len, bpl;
   5130             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5131             if (scan != STBI__SCAN_load) return 1;
   5132             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
   5133             // initial guess for decoded data size to avoid unnecessary reallocs
   5134             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
   5135             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
   5136             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
   5137             if (z->expanded == NULL) return 0; // zlib should set error
   5138             STBI_FREE(z->idata); z->idata = NULL;
   5139             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
   5140                s->img_out_n = s->img_n+1;
   5141             else
   5142                s->img_out_n = s->img_n;
   5143             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
   5144             if (has_trans) {
   5145                if (z->depth == 16) {
   5146                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
   5147                } else {
   5148                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
   5149                }
   5150             }
   5151             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
   5152                stbi__de_iphone(z);
   5153             if (pal_img_n) {
   5154                // pal_img_n == 3 or 4
   5155                s->img_n = pal_img_n; // record the actual colors we had
   5156                s->img_out_n = pal_img_n;
   5157                if (req_comp >= 3) s->img_out_n = req_comp;
   5158                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
   5159                   return 0;
   5160             } else if (has_trans) {
   5161                // non-paletted image with tRNS -> source image has (constant) alpha
   5162                ++s->img_n;
   5163             }
   5164             STBI_FREE(z->expanded); z->expanded = NULL;
   5165             // end of PNG chunk, read and skip CRC
   5166             stbi__get32be(s);
   5167             return 1;
   5168          }
   5169 
   5170          default:
   5171             // if critical, fail
   5172             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5173             if ((c.type & (1 << 29)) == 0) {
   5174                #ifndef STBI_NO_FAILURE_STRINGS
   5175                // not threadsafe
   5176                static char invalid_chunk[] = "XXXX PNG chunk not known";
   5177                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
   5178                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
   5179                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
   5180                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
   5181                #endif
   5182                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
   5183             }
   5184             stbi__skip(s, c.length);
   5185             break;
   5186       }
   5187       // end of PNG chunk, read and skip CRC
   5188       stbi__get32be(s);
   5189    }
   5190 }
   5191 
   5192 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
   5193 {
   5194    void *result=NULL;
   5195    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   5196    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
   5197       if (p->depth <= 8)
   5198          ri->bits_per_channel = 8;
   5199       else if (p->depth == 16)
   5200          ri->bits_per_channel = 16;
   5201       else
   5202          return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
   5203       result = p->out;
   5204       p->out = NULL;
   5205       if (req_comp && req_comp != p->s->img_out_n) {
   5206          if (ri->bits_per_channel == 8)
   5207             result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   5208          else
   5209             result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   5210          p->s->img_out_n = req_comp;
   5211          if (result == NULL) return result;
   5212       }
   5213       *x = p->s->img_x;
   5214       *y = p->s->img_y;
   5215       if (n) *n = p->s->img_n;
   5216    }
   5217    STBI_FREE(p->out);      p->out      = NULL;
   5218    STBI_FREE(p->expanded); p->expanded = NULL;
   5219    STBI_FREE(p->idata);    p->idata    = NULL;
   5220 
   5221    return result;
   5222 }
   5223 
   5224 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5225 {
   5226    stbi__png p;
   5227    p.s = s;
   5228    return stbi__do_png(&p, x,y,comp,req_comp, ri);
   5229 }
   5230 
   5231 static int stbi__png_test(stbi__context *s)
   5232 {
   5233    int r;
   5234    r = stbi__check_png_header(s);
   5235    stbi__rewind(s);
   5236    return r;
   5237 }
   5238 
   5239 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
   5240 {
   5241    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
   5242       stbi__rewind( p->s );
   5243       return 0;
   5244    }
   5245    if (x) *x = p->s->img_x;
   5246    if (y) *y = p->s->img_y;
   5247    if (comp) *comp = p->s->img_n;
   5248    return 1;
   5249 }
   5250 
   5251 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
   5252 {
   5253    stbi__png p;
   5254    p.s = s;
   5255    return stbi__png_info_raw(&p, x, y, comp);
   5256 }
   5257 
   5258 static int stbi__png_is16(stbi__context *s)
   5259 {
   5260    stbi__png p;
   5261    p.s = s;
   5262    if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
   5263 	   return 0;
   5264    if (p.depth != 16) {
   5265       stbi__rewind(p.s);
   5266       return 0;
   5267    }
   5268    return 1;
   5269 }
   5270 #endif
   5271 
   5272 // Microsoft/Windows BMP image
   5273 
   5274 #ifndef STBI_NO_BMP
   5275 static int stbi__bmp_test_raw(stbi__context *s)
   5276 {
   5277    int r;
   5278    int sz;
   5279    if (stbi__get8(s) != 'B') return 0;
   5280    if (stbi__get8(s) != 'M') return 0;
   5281    stbi__get32le(s); // discard filesize
   5282    stbi__get16le(s); // discard reserved
   5283    stbi__get16le(s); // discard reserved
   5284    stbi__get32le(s); // discard data offset
   5285    sz = stbi__get32le(s);
   5286    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
   5287    return r;
   5288 }
   5289 
   5290 static int stbi__bmp_test(stbi__context *s)
   5291 {
   5292    int r = stbi__bmp_test_raw(s);
   5293    stbi__rewind(s);
   5294    return r;
   5295 }
   5296 
   5297 
   5298 // returns 0..31 for the highest set bit
   5299 static int stbi__high_bit(unsigned int z)
   5300 {
   5301    int n=0;
   5302    if (z == 0) return -1;
   5303    if (z >= 0x10000) { n += 16; z >>= 16; }
   5304    if (z >= 0x00100) { n +=  8; z >>=  8; }
   5305    if (z >= 0x00010) { n +=  4; z >>=  4; }
   5306    if (z >= 0x00004) { n +=  2; z >>=  2; }
   5307    if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
   5308    return n;
   5309 }
   5310 
   5311 static int stbi__bitcount(unsigned int a)
   5312 {
   5313    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
   5314    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
   5315    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
   5316    a = (a + (a >> 8)); // max 16 per 8 bits
   5317    a = (a + (a >> 16)); // max 32 per 8 bits
   5318    return a & 0xff;
   5319 }
   5320 
   5321 // extract an arbitrarily-aligned N-bit value (N=bits)
   5322 // from v, and then make it 8-bits long and fractionally
   5323 // extend it to full full range.
   5324 static int stbi__shiftsigned(unsigned int v, int shift, int bits)
   5325 {
   5326    static unsigned int mul_table[9] = {
   5327       0,
   5328       0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
   5329       0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
   5330    };
   5331    static unsigned int shift_table[9] = {
   5332       0, 0,0,1,0,2,4,6,0,
   5333    };
   5334    if (shift < 0)
   5335       v <<= -shift;
   5336    else
   5337       v >>= shift;
   5338    STBI_ASSERT(v < 256);
   5339    v >>= (8-bits);
   5340    STBI_ASSERT(bits >= 0 && bits <= 8);
   5341    return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
   5342 }
   5343 
   5344 typedef struct
   5345 {
   5346    int bpp, offset, hsz;
   5347    unsigned int mr,mg,mb,ma, all_a;
   5348    int extra_read;
   5349 } stbi__bmp_data;
   5350 
   5351 static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
   5352 {
   5353    // BI_BITFIELDS specifies masks explicitly, don't override
   5354    if (compress == 3)
   5355       return 1;
   5356 
   5357    if (compress == 0) {
   5358       if (info->bpp == 16) {
   5359          info->mr = 31u << 10;
   5360          info->mg = 31u <<  5;
   5361          info->mb = 31u <<  0;
   5362       } else if (info->bpp == 32) {
   5363          info->mr = 0xffu << 16;
   5364          info->mg = 0xffu <<  8;
   5365          info->mb = 0xffu <<  0;
   5366          info->ma = 0xffu << 24;
   5367          info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
   5368       } else {
   5369          // otherwise, use defaults, which is all-0
   5370          info->mr = info->mg = info->mb = info->ma = 0;
   5371       }
   5372       return 1;
   5373    }
   5374    return 0; // error
   5375 }
   5376 
   5377 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
   5378 {
   5379    int hsz;
   5380    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
   5381    stbi__get32le(s); // discard filesize
   5382    stbi__get16le(s); // discard reserved
   5383    stbi__get16le(s); // discard reserved
   5384    info->offset = stbi__get32le(s);
   5385    info->hsz = hsz = stbi__get32le(s);
   5386    info->mr = info->mg = info->mb = info->ma = 0;
   5387    info->extra_read = 14;
   5388 
   5389    if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
   5390 
   5391    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
   5392    if (hsz == 12) {
   5393       s->img_x = stbi__get16le(s);
   5394       s->img_y = stbi__get16le(s);
   5395    } else {
   5396       s->img_x = stbi__get32le(s);
   5397       s->img_y = stbi__get32le(s);
   5398    }
   5399    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
   5400    info->bpp = stbi__get16le(s);
   5401    if (hsz != 12) {
   5402       int compress = stbi__get32le(s);
   5403       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
   5404       if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
   5405       if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
   5406       stbi__get32le(s); // discard sizeof
   5407       stbi__get32le(s); // discard hres
   5408       stbi__get32le(s); // discard vres
   5409       stbi__get32le(s); // discard colorsused
   5410       stbi__get32le(s); // discard max important
   5411       if (hsz == 40 || hsz == 56) {
   5412          if (hsz == 56) {
   5413             stbi__get32le(s);
   5414             stbi__get32le(s);
   5415             stbi__get32le(s);
   5416             stbi__get32le(s);
   5417          }
   5418          if (info->bpp == 16 || info->bpp == 32) {
   5419             if (compress == 0) {
   5420                stbi__bmp_set_mask_defaults(info, compress);
   5421             } else if (compress == 3) {
   5422                info->mr = stbi__get32le(s);
   5423                info->mg = stbi__get32le(s);
   5424                info->mb = stbi__get32le(s);
   5425                info->extra_read += 12;
   5426                // not documented, but generated by photoshop and handled by mspaint
   5427                if (info->mr == info->mg && info->mg == info->mb) {
   5428                   // ?!?!?
   5429                   return stbi__errpuc("bad BMP", "bad BMP");
   5430                }
   5431             } else
   5432                return stbi__errpuc("bad BMP", "bad BMP");
   5433          }
   5434       } else {
   5435          // V4/V5 header
   5436          int i;
   5437          if (hsz != 108 && hsz != 124)
   5438             return stbi__errpuc("bad BMP", "bad BMP");
   5439          info->mr = stbi__get32le(s);
   5440          info->mg = stbi__get32le(s);
   5441          info->mb = stbi__get32le(s);
   5442          info->ma = stbi__get32le(s);
   5443          if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
   5444             stbi__bmp_set_mask_defaults(info, compress);
   5445          stbi__get32le(s); // discard color space
   5446          for (i=0; i < 12; ++i)
   5447             stbi__get32le(s); // discard color space parameters
   5448          if (hsz == 124) {
   5449             stbi__get32le(s); // discard rendering intent
   5450             stbi__get32le(s); // discard offset of profile data
   5451             stbi__get32le(s); // discard size of profile data
   5452             stbi__get32le(s); // discard reserved
   5453          }
   5454       }
   5455    }
   5456    return (void *) 1;
   5457 }
   5458 
   5459 
   5460 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5461 {
   5462    stbi_uc *out;
   5463    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
   5464    stbi_uc pal[256][4];
   5465    int psize=0,i,j,width;
   5466    int flip_vertically, pad, target;
   5467    stbi__bmp_data info;
   5468    STBI_NOTUSED(ri);
   5469 
   5470    info.all_a = 255;
   5471    if (stbi__bmp_parse_header(s, &info) == NULL)
   5472       return NULL; // error code already set
   5473 
   5474    flip_vertically = ((int) s->img_y) > 0;
   5475    s->img_y = abs((int) s->img_y);
   5476 
   5477    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5478    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5479 
   5480    mr = info.mr;
   5481    mg = info.mg;
   5482    mb = info.mb;
   5483    ma = info.ma;
   5484    all_a = info.all_a;
   5485 
   5486    if (info.hsz == 12) {
   5487       if (info.bpp < 24)
   5488          psize = (info.offset - info.extra_read - 24) / 3;
   5489    } else {
   5490       if (info.bpp < 16)
   5491          psize = (info.offset - info.extra_read - info.hsz) >> 2;
   5492    }
   5493    if (psize == 0) {
   5494       if (info.offset != s->callback_already_read + (s->img_buffer - s->img_buffer_original)) {
   5495         return stbi__errpuc("bad offset", "Corrupt BMP");
   5496       }
   5497    }
   5498 
   5499    if (info.bpp == 24 && ma == 0xff000000)
   5500       s->img_n = 3;
   5501    else
   5502       s->img_n = ma ? 4 : 3;
   5503    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
   5504       target = req_comp;
   5505    else
   5506       target = s->img_n; // if they want monochrome, we'll post-convert
   5507 
   5508    // sanity-check size
   5509    if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
   5510       return stbi__errpuc("too large", "Corrupt BMP");
   5511 
   5512    out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
   5513    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   5514    if (info.bpp < 16) {
   5515       int z=0;
   5516       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
   5517       for (i=0; i < psize; ++i) {
   5518          pal[i][2] = stbi__get8(s);
   5519          pal[i][1] = stbi__get8(s);
   5520          pal[i][0] = stbi__get8(s);
   5521          if (info.hsz != 12) stbi__get8(s);
   5522          pal[i][3] = 255;
   5523       }
   5524       stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
   5525       if (info.bpp == 1) width = (s->img_x + 7) >> 3;
   5526       else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
   5527       else if (info.bpp == 8) width = s->img_x;
   5528       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
   5529       pad = (-width)&3;
   5530       if (info.bpp == 1) {
   5531          for (j=0; j < (int) s->img_y; ++j) {
   5532             int bit_offset = 7, v = stbi__get8(s);
   5533             for (i=0; i < (int) s->img_x; ++i) {
   5534                int color = (v>>bit_offset)&0x1;
   5535                out[z++] = pal[color][0];
   5536                out[z++] = pal[color][1];
   5537                out[z++] = pal[color][2];
   5538                if (target == 4) out[z++] = 255;
   5539                if (i+1 == (int) s->img_x) break;
   5540                if((--bit_offset) < 0) {
   5541                   bit_offset = 7;
   5542                   v = stbi__get8(s);
   5543                }
   5544             }
   5545             stbi__skip(s, pad);
   5546          }
   5547       } else {
   5548          for (j=0; j < (int) s->img_y; ++j) {
   5549             for (i=0; i < (int) s->img_x; i += 2) {
   5550                int v=stbi__get8(s),v2=0;
   5551                if (info.bpp == 4) {
   5552                   v2 = v & 15;
   5553                   v >>= 4;
   5554                }
   5555                out[z++] = pal[v][0];
   5556                out[z++] = pal[v][1];
   5557                out[z++] = pal[v][2];
   5558                if (target == 4) out[z++] = 255;
   5559                if (i+1 == (int) s->img_x) break;
   5560                v = (info.bpp == 8) ? stbi__get8(s) : v2;
   5561                out[z++] = pal[v][0];
   5562                out[z++] = pal[v][1];
   5563                out[z++] = pal[v][2];
   5564                if (target == 4) out[z++] = 255;
   5565             }
   5566             stbi__skip(s, pad);
   5567          }
   5568       }
   5569    } else {
   5570       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
   5571       int z = 0;
   5572       int easy=0;
   5573       stbi__skip(s, info.offset - info.extra_read - info.hsz);
   5574       if (info.bpp == 24) width = 3 * s->img_x;
   5575       else if (info.bpp == 16) width = 2*s->img_x;
   5576       else /* bpp = 32 and pad = 0 */ width=0;
   5577       pad = (-width) & 3;
   5578       if (info.bpp == 24) {
   5579          easy = 1;
   5580       } else if (info.bpp == 32) {
   5581          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
   5582             easy = 2;
   5583       }
   5584       if (!easy) {
   5585          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   5586          // right shift amt to put high bit in position #7
   5587          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
   5588          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
   5589          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
   5590          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
   5591          if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   5592       }
   5593       for (j=0; j < (int) s->img_y; ++j) {
   5594          if (easy) {
   5595             for (i=0; i < (int) s->img_x; ++i) {
   5596                unsigned char a;
   5597                out[z+2] = stbi__get8(s);
   5598                out[z+1] = stbi__get8(s);
   5599                out[z+0] = stbi__get8(s);
   5600                z += 3;
   5601                a = (easy == 2 ? stbi__get8(s) : 255);
   5602                all_a |= a;
   5603                if (target == 4) out[z++] = a;
   5604             }
   5605          } else {
   5606             int bpp = info.bpp;
   5607             for (i=0; i < (int) s->img_x; ++i) {
   5608                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
   5609                unsigned int a;
   5610                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
   5611                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
   5612                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
   5613                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
   5614                all_a |= a;
   5615                if (target == 4) out[z++] = STBI__BYTECAST(a);
   5616             }
   5617          }
   5618          stbi__skip(s, pad);
   5619       }
   5620    }
   5621 
   5622    // if alpha channel is all 0s, replace with all 255s
   5623    if (target == 4 && all_a == 0)
   5624       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
   5625          out[i] = 255;
   5626 
   5627    if (flip_vertically) {
   5628       stbi_uc t;
   5629       for (j=0; j < (int) s->img_y>>1; ++j) {
   5630          stbi_uc *p1 = out +      j     *s->img_x*target;
   5631          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
   5632          for (i=0; i < (int) s->img_x*target; ++i) {
   5633             t = p1[i]; p1[i] = p2[i]; p2[i] = t;
   5634          }
   5635       }
   5636    }
   5637 
   5638    if (req_comp && req_comp != target) {
   5639       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
   5640       if (out == NULL) return out; // stbi__convert_format frees input on failure
   5641    }
   5642 
   5643    *x = s->img_x;
   5644    *y = s->img_y;
   5645    if (comp) *comp = s->img_n;
   5646    return out;
   5647 }
   5648 #endif
   5649 
   5650 // Targa Truevision - TGA
   5651 // by Jonathan Dummer
   5652 #ifndef STBI_NO_TGA
   5653 // returns STBI_rgb or whatever, 0 on error
   5654 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
   5655 {
   5656    // only RGB or RGBA (incl. 16bit) or grey allowed
   5657    if (is_rgb16) *is_rgb16 = 0;
   5658    switch(bits_per_pixel) {
   5659       case 8:  return STBI_grey;
   5660       case 16: if(is_grey) return STBI_grey_alpha;
   5661                // fallthrough
   5662       case 15: if(is_rgb16) *is_rgb16 = 1;
   5663                return STBI_rgb;
   5664       case 24: // fallthrough
   5665       case 32: return bits_per_pixel/8;
   5666       default: return 0;
   5667    }
   5668 }
   5669 
   5670 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
   5671 {
   5672     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
   5673     int sz, tga_colormap_type;
   5674     stbi__get8(s);                   // discard Offset
   5675     tga_colormap_type = stbi__get8(s); // colormap type
   5676     if( tga_colormap_type > 1 ) {
   5677         stbi__rewind(s);
   5678         return 0;      // only RGB or indexed allowed
   5679     }
   5680     tga_image_type = stbi__get8(s); // image type
   5681     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
   5682         if (tga_image_type != 1 && tga_image_type != 9) {
   5683             stbi__rewind(s);
   5684             return 0;
   5685         }
   5686         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5687         sz = stbi__get8(s);    //   check bits per palette color entry
   5688         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
   5689             stbi__rewind(s);
   5690             return 0;
   5691         }
   5692         stbi__skip(s,4);       // skip image x and y origin
   5693         tga_colormap_bpp = sz;
   5694     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
   5695         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
   5696             stbi__rewind(s);
   5697             return 0; // only RGB or grey allowed, +/- RLE
   5698         }
   5699         stbi__skip(s,9); // skip colormap specification and image x/y origin
   5700         tga_colormap_bpp = 0;
   5701     }
   5702     tga_w = stbi__get16le(s);
   5703     if( tga_w < 1 ) {
   5704         stbi__rewind(s);
   5705         return 0;   // test width
   5706     }
   5707     tga_h = stbi__get16le(s);
   5708     if( tga_h < 1 ) {
   5709         stbi__rewind(s);
   5710         return 0;   // test height
   5711     }
   5712     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
   5713     stbi__get8(s); // ignore alpha bits
   5714     if (tga_colormap_bpp != 0) {
   5715         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
   5716             // when using a colormap, tga_bits_per_pixel is the size of the indexes
   5717             // I don't think anything but 8 or 16bit indexes makes sense
   5718             stbi__rewind(s);
   5719             return 0;
   5720         }
   5721         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
   5722     } else {
   5723         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
   5724     }
   5725     if(!tga_comp) {
   5726       stbi__rewind(s);
   5727       return 0;
   5728     }
   5729     if (x) *x = tga_w;
   5730     if (y) *y = tga_h;
   5731     if (comp) *comp = tga_comp;
   5732     return 1;                   // seems to have passed everything
   5733 }
   5734 
   5735 static int stbi__tga_test(stbi__context *s)
   5736 {
   5737    int res = 0;
   5738    int sz, tga_color_type;
   5739    stbi__get8(s);      //   discard Offset
   5740    tga_color_type = stbi__get8(s);   //   color type
   5741    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
   5742    sz = stbi__get8(s);   //   image type
   5743    if ( tga_color_type == 1 ) { // colormapped (paletted) image
   5744       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
   5745       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5746       sz = stbi__get8(s);    //   check bits per palette color entry
   5747       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5748       stbi__skip(s,4);       // skip image x and y origin
   5749    } else { // "normal" image w/o colormap
   5750       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
   5751       stbi__skip(s,9); // skip colormap specification and image x/y origin
   5752    }
   5753    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
   5754    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
   5755    sz = stbi__get8(s);   //   bits per pixel
   5756    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
   5757    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5758 
   5759    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
   5760 
   5761 errorEnd:
   5762    stbi__rewind(s);
   5763    return res;
   5764 }
   5765 
   5766 // read 16bit value and convert to 24bit RGB
   5767 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
   5768 {
   5769    stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
   5770    stbi__uint16 fiveBitMask = 31;
   5771    // we have 3 channels with 5bits each
   5772    int r = (px >> 10) & fiveBitMask;
   5773    int g = (px >> 5) & fiveBitMask;
   5774    int b = px & fiveBitMask;
   5775    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
   5776    out[0] = (stbi_uc)((r * 255)/31);
   5777    out[1] = (stbi_uc)((g * 255)/31);
   5778    out[2] = (stbi_uc)((b * 255)/31);
   5779 
   5780    // some people claim that the most significant bit might be used for alpha
   5781    // (possibly if an alpha-bit is set in the "image descriptor byte")
   5782    // but that only made 16bit test images completely translucent..
   5783    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
   5784 }
   5785 
   5786 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5787 {
   5788    //   read in the TGA header stuff
   5789    int tga_offset = stbi__get8(s);
   5790    int tga_indexed = stbi__get8(s);
   5791    int tga_image_type = stbi__get8(s);
   5792    int tga_is_RLE = 0;
   5793    int tga_palette_start = stbi__get16le(s);
   5794    int tga_palette_len = stbi__get16le(s);
   5795    int tga_palette_bits = stbi__get8(s);
   5796    int tga_x_origin = stbi__get16le(s);
   5797    int tga_y_origin = stbi__get16le(s);
   5798    int tga_width = stbi__get16le(s);
   5799    int tga_height = stbi__get16le(s);
   5800    int tga_bits_per_pixel = stbi__get8(s);
   5801    int tga_comp, tga_rgb16=0;
   5802    int tga_inverted = stbi__get8(s);
   5803    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
   5804    //   image data
   5805    unsigned char *tga_data;
   5806    unsigned char *tga_palette = NULL;
   5807    int i, j;
   5808    unsigned char raw_data[4] = {0};
   5809    int RLE_count = 0;
   5810    int RLE_repeating = 0;
   5811    int read_next_pixel = 1;
   5812    STBI_NOTUSED(ri);
   5813    STBI_NOTUSED(tga_x_origin); // @TODO
   5814    STBI_NOTUSED(tga_y_origin); // @TODO
   5815 
   5816    if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5817    if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5818 
   5819    //   do a tiny bit of precessing
   5820    if ( tga_image_type >= 8 )
   5821    {
   5822       tga_image_type -= 8;
   5823       tga_is_RLE = 1;
   5824    }
   5825    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
   5826 
   5827    //   If I'm paletted, then I'll use the number of bits from the palette
   5828    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
   5829    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
   5830 
   5831    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
   5832       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
   5833 
   5834    //   tga info
   5835    *x = tga_width;
   5836    *y = tga_height;
   5837    if (comp) *comp = tga_comp;
   5838 
   5839    if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
   5840       return stbi__errpuc("too large", "Corrupt TGA");
   5841 
   5842    tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
   5843    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
   5844 
   5845    // skip to the data's starting position (offset usually = 0)
   5846    stbi__skip(s, tga_offset );
   5847 
   5848    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
   5849       for (i=0; i < tga_height; ++i) {
   5850          int row = tga_inverted ? tga_height -i - 1 : i;
   5851          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
   5852          stbi__getn(s, tga_row, tga_width * tga_comp);
   5853       }
   5854    } else  {
   5855       //   do I need to load a palette?
   5856       if ( tga_indexed)
   5857       {
   5858          if (tga_palette_len == 0) {  /* you have to have at least one entry! */
   5859             STBI_FREE(tga_data);
   5860             return stbi__errpuc("bad palette", "Corrupt TGA");
   5861          }
   5862 
   5863          //   any data to skip? (offset usually = 0)
   5864          stbi__skip(s, tga_palette_start );
   5865          //   load the palette
   5866          tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
   5867          if (!tga_palette) {
   5868             STBI_FREE(tga_data);
   5869             return stbi__errpuc("outofmem", "Out of memory");
   5870          }
   5871          if (tga_rgb16) {
   5872             stbi_uc *pal_entry = tga_palette;
   5873             STBI_ASSERT(tga_comp == STBI_rgb);
   5874             for (i=0; i < tga_palette_len; ++i) {
   5875                stbi__tga_read_rgb16(s, pal_entry);
   5876                pal_entry += tga_comp;
   5877             }
   5878          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
   5879                STBI_FREE(tga_data);
   5880                STBI_FREE(tga_palette);
   5881                return stbi__errpuc("bad palette", "Corrupt TGA");
   5882          }
   5883       }
   5884       //   load the data
   5885       for (i=0; i < tga_width * tga_height; ++i)
   5886       {
   5887          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
   5888          if ( tga_is_RLE )
   5889          {
   5890             if ( RLE_count == 0 )
   5891             {
   5892                //   yep, get the next byte as a RLE command
   5893                int RLE_cmd = stbi__get8(s);
   5894                RLE_count = 1 + (RLE_cmd & 127);
   5895                RLE_repeating = RLE_cmd >> 7;
   5896                read_next_pixel = 1;
   5897             } else if ( !RLE_repeating )
   5898             {
   5899                read_next_pixel = 1;
   5900             }
   5901          } else
   5902          {
   5903             read_next_pixel = 1;
   5904          }
   5905          //   OK, if I need to read a pixel, do it now
   5906          if ( read_next_pixel )
   5907          {
   5908             //   load however much data we did have
   5909             if ( tga_indexed )
   5910             {
   5911                // read in index, then perform the lookup
   5912                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
   5913                if ( pal_idx >= tga_palette_len ) {
   5914                   // invalid index
   5915                   pal_idx = 0;
   5916                }
   5917                pal_idx *= tga_comp;
   5918                for (j = 0; j < tga_comp; ++j) {
   5919                   raw_data[j] = tga_palette[pal_idx+j];
   5920                }
   5921             } else if(tga_rgb16) {
   5922                STBI_ASSERT(tga_comp == STBI_rgb);
   5923                stbi__tga_read_rgb16(s, raw_data);
   5924             } else {
   5925                //   read in the data raw
   5926                for (j = 0; j < tga_comp; ++j) {
   5927                   raw_data[j] = stbi__get8(s);
   5928                }
   5929             }
   5930             //   clear the reading flag for the next pixel
   5931             read_next_pixel = 0;
   5932          } // end of reading a pixel
   5933 
   5934          // copy data
   5935          for (j = 0; j < tga_comp; ++j)
   5936            tga_data[i*tga_comp+j] = raw_data[j];
   5937 
   5938          //   in case we're in RLE mode, keep counting down
   5939          --RLE_count;
   5940       }
   5941       //   do I need to invert the image?
   5942       if ( tga_inverted )
   5943       {
   5944          for (j = 0; j*2 < tga_height; ++j)
   5945          {
   5946             int index1 = j * tga_width * tga_comp;
   5947             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
   5948             for (i = tga_width * tga_comp; i > 0; --i)
   5949             {
   5950                unsigned char temp = tga_data[index1];
   5951                tga_data[index1] = tga_data[index2];
   5952                tga_data[index2] = temp;
   5953                ++index1;
   5954                ++index2;
   5955             }
   5956          }
   5957       }
   5958       //   clear my palette, if I had one
   5959       if ( tga_palette != NULL )
   5960       {
   5961          STBI_FREE( tga_palette );
   5962       }
   5963    }
   5964 
   5965    // swap RGB - if the source data was RGB16, it already is in the right order
   5966    if (tga_comp >= 3 && !tga_rgb16)
   5967    {
   5968       unsigned char* tga_pixel = tga_data;
   5969       for (i=0; i < tga_width * tga_height; ++i)
   5970       {
   5971          unsigned char temp = tga_pixel[0];
   5972          tga_pixel[0] = tga_pixel[2];
   5973          tga_pixel[2] = temp;
   5974          tga_pixel += tga_comp;
   5975       }
   5976    }
   5977 
   5978    // convert to target component count
   5979    if (req_comp && req_comp != tga_comp)
   5980       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
   5981 
   5982    //   the things I do to get rid of an error message, and yet keep
   5983    //   Microsoft's C compilers happy... [8^(
   5984    tga_palette_start = tga_palette_len = tga_palette_bits =
   5985          tga_x_origin = tga_y_origin = 0;
   5986    STBI_NOTUSED(tga_palette_start);
   5987    //   OK, done
   5988    return tga_data;
   5989 }
   5990 #endif
   5991 
   5992 // *************************************************************************************************
   5993 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
   5994 
   5995 #ifndef STBI_NO_PSD
   5996 static int stbi__psd_test(stbi__context *s)
   5997 {
   5998    int r = (stbi__get32be(s) == 0x38425053);
   5999    stbi__rewind(s);
   6000    return r;
   6001 }
   6002 
   6003 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
   6004 {
   6005    int count, nleft, len;
   6006 
   6007    count = 0;
   6008    while ((nleft = pixelCount - count) > 0) {
   6009       len = stbi__get8(s);
   6010       if (len == 128) {
   6011          // No-op.
   6012       } else if (len < 128) {
   6013          // Copy next len+1 bytes literally.
   6014          len++;
   6015          if (len > nleft) return 0; // corrupt data
   6016          count += len;
   6017          while (len) {
   6018             *p = stbi__get8(s);
   6019             p += 4;
   6020             len--;
   6021          }
   6022       } else if (len > 128) {
   6023          stbi_uc   val;
   6024          // Next -len+1 bytes in the dest are replicated from next source byte.
   6025          // (Interpret len as a negative 8-bit int.)
   6026          len = 257 - len;
   6027          if (len > nleft) return 0; // corrupt data
   6028          val = stbi__get8(s);
   6029          count += len;
   6030          while (len) {
   6031             *p = val;
   6032             p += 4;
   6033             len--;
   6034          }
   6035       }
   6036    }
   6037 
   6038    return 1;
   6039 }
   6040 
   6041 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
   6042 {
   6043    int pixelCount;
   6044    int channelCount, compression;
   6045    int channel, i;
   6046    int bitdepth;
   6047    int w,h;
   6048    stbi_uc *out;
   6049    STBI_NOTUSED(ri);
   6050 
   6051    // Check identifier
   6052    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
   6053       return stbi__errpuc("not PSD", "Corrupt PSD image");
   6054 
   6055    // Check file type version.
   6056    if (stbi__get16be(s) != 1)
   6057       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
   6058 
   6059    // Skip 6 reserved bytes.
   6060    stbi__skip(s, 6 );
   6061 
   6062    // Read the number of channels (R, G, B, A, etc).
   6063    channelCount = stbi__get16be(s);
   6064    if (channelCount < 0 || channelCount > 16)
   6065       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
   6066 
   6067    // Read the rows and columns of the image.
   6068    h = stbi__get32be(s);
   6069    w = stbi__get32be(s);
   6070 
   6071    if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6072    if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6073 
   6074    // Make sure the depth is 8 bits.
   6075    bitdepth = stbi__get16be(s);
   6076    if (bitdepth != 8 && bitdepth != 16)
   6077       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
   6078 
   6079    // Make sure the color mode is RGB.
   6080    // Valid options are:
   6081    //   0: Bitmap
   6082    //   1: Grayscale
   6083    //   2: Indexed color
   6084    //   3: RGB color
   6085    //   4: CMYK color
   6086    //   7: Multichannel
   6087    //   8: Duotone
   6088    //   9: Lab color
   6089    if (stbi__get16be(s) != 3)
   6090       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
   6091 
   6092    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
   6093    stbi__skip(s,stbi__get32be(s) );
   6094 
   6095    // Skip the image resources.  (resolution, pen tool paths, etc)
   6096    stbi__skip(s, stbi__get32be(s) );
   6097 
   6098    // Skip the reserved data.
   6099    stbi__skip(s, stbi__get32be(s) );
   6100 
   6101    // Find out if the data is compressed.
   6102    // Known values:
   6103    //   0: no compression
   6104    //   1: RLE compressed
   6105    compression = stbi__get16be(s);
   6106    if (compression > 1)
   6107       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
   6108 
   6109    // Check size
   6110    if (!stbi__mad3sizes_valid(4, w, h, 0))
   6111       return stbi__errpuc("too large", "Corrupt PSD");
   6112 
   6113    // Create the destination image.
   6114 
   6115    if (!compression && bitdepth == 16 && bpc == 16) {
   6116       out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
   6117       ri->bits_per_channel = 16;
   6118    } else
   6119       out = (stbi_uc *) stbi__malloc(4 * w*h);
   6120 
   6121    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   6122    pixelCount = w*h;
   6123 
   6124    // Initialize the data to zero.
   6125    //memset( out, 0, pixelCount * 4 );
   6126 
   6127    // Finally, the image data.
   6128    if (compression) {
   6129       // RLE as used by .PSD and .TIFF
   6130       // Loop until you get the number of unpacked bytes you are expecting:
   6131       //     Read the next source byte into n.
   6132       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
   6133       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
   6134       //     Else if n is 128, noop.
   6135       // Endloop
   6136 
   6137       // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
   6138       // which we're going to just skip.
   6139       stbi__skip(s, h * channelCount * 2 );
   6140 
   6141       // Read the RLE data by channel.
   6142       for (channel = 0; channel < 4; channel++) {
   6143          stbi_uc *p;
   6144 
   6145          p = out+channel;
   6146          if (channel >= channelCount) {
   6147             // Fill this channel with default data.
   6148             for (i = 0; i < pixelCount; i++, p += 4)
   6149                *p = (channel == 3 ? 255 : 0);
   6150          } else {
   6151             // Read the RLE data.
   6152             if (!stbi__psd_decode_rle(s, p, pixelCount)) {
   6153                STBI_FREE(out);
   6154                return stbi__errpuc("corrupt", "bad RLE data");
   6155             }
   6156          }
   6157       }
   6158 
   6159    } else {
   6160       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
   6161       // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
   6162 
   6163       // Read the data by channel.
   6164       for (channel = 0; channel < 4; channel++) {
   6165          if (channel >= channelCount) {
   6166             // Fill this channel with default data.
   6167             if (bitdepth == 16 && bpc == 16) {
   6168                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   6169                stbi__uint16 val = channel == 3 ? 65535 : 0;
   6170                for (i = 0; i < pixelCount; i++, q += 4)
   6171                   *q = val;
   6172             } else {
   6173                stbi_uc *p = out+channel;
   6174                stbi_uc val = channel == 3 ? 255 : 0;
   6175                for (i = 0; i < pixelCount; i++, p += 4)
   6176                   *p = val;
   6177             }
   6178          } else {
   6179             if (ri->bits_per_channel == 16) {    // output bpc
   6180                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   6181                for (i = 0; i < pixelCount; i++, q += 4)
   6182                   *q = (stbi__uint16) stbi__get16be(s);
   6183             } else {
   6184                stbi_uc *p = out+channel;
   6185                if (bitdepth == 16) {  // input bpc
   6186                   for (i = 0; i < pixelCount; i++, p += 4)
   6187                      *p = (stbi_uc) (stbi__get16be(s) >> 8);
   6188                } else {
   6189                   for (i = 0; i < pixelCount; i++, p += 4)
   6190                      *p = stbi__get8(s);
   6191                }
   6192             }
   6193          }
   6194       }
   6195    }
   6196 
   6197    // remove weird white matte from PSD
   6198    if (channelCount >= 4) {
   6199       if (ri->bits_per_channel == 16) {
   6200          for (i=0; i < w*h; ++i) {
   6201             stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
   6202             if (pixel[3] != 0 && pixel[3] != 65535) {
   6203                float a = pixel[3] / 65535.0f;
   6204                float ra = 1.0f / a;
   6205                float inv_a = 65535.0f * (1 - ra);
   6206                pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
   6207                pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
   6208                pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
   6209             }
   6210          }
   6211       } else {
   6212          for (i=0; i < w*h; ++i) {
   6213             unsigned char *pixel = out + 4*i;
   6214             if (pixel[3] != 0 && pixel[3] != 255) {
   6215                float a = pixel[3] / 255.0f;
   6216                float ra = 1.0f / a;
   6217                float inv_a = 255.0f * (1 - ra);
   6218                pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
   6219                pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
   6220                pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
   6221             }
   6222          }
   6223       }
   6224    }
   6225 
   6226    // convert to desired output format
   6227    if (req_comp && req_comp != 4) {
   6228       if (ri->bits_per_channel == 16)
   6229          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
   6230       else
   6231          out = stbi__convert_format(out, 4, req_comp, w, h);
   6232       if (out == NULL) return out; // stbi__convert_format frees input on failure
   6233    }
   6234 
   6235    if (comp) *comp = 4;
   6236    *y = h;
   6237    *x = w;
   6238 
   6239    return out;
   6240 }
   6241 #endif
   6242 
   6243 // *************************************************************************************************
   6244 // Softimage PIC loader
   6245 // by Tom Seddon
   6246 //
   6247 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
   6248 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
   6249 
   6250 #ifndef STBI_NO_PIC
   6251 static int stbi__pic_is4(stbi__context *s,const char *str)
   6252 {
   6253    int i;
   6254    for (i=0; i<4; ++i)
   6255       if (stbi__get8(s) != (stbi_uc)str[i])
   6256          return 0;
   6257 
   6258    return 1;
   6259 }
   6260 
   6261 static int stbi__pic_test_core(stbi__context *s)
   6262 {
   6263    int i;
   6264 
   6265    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
   6266       return 0;
   6267 
   6268    for(i=0;i<84;++i)
   6269       stbi__get8(s);
   6270 
   6271    if (!stbi__pic_is4(s,"PICT"))
   6272       return 0;
   6273 
   6274    return 1;
   6275 }
   6276 
   6277 typedef struct
   6278 {
   6279    stbi_uc size,type,channel;
   6280 } stbi__pic_packet;
   6281 
   6282 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
   6283 {
   6284    int mask=0x80, i;
   6285 
   6286    for (i=0; i<4; ++i, mask>>=1) {
   6287       if (channel & mask) {
   6288          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
   6289          dest[i]=stbi__get8(s);
   6290       }
   6291    }
   6292 
   6293    return dest;
   6294 }
   6295 
   6296 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
   6297 {
   6298    int mask=0x80,i;
   6299 
   6300    for (i=0;i<4; ++i, mask>>=1)
   6301       if (channel&mask)
   6302          dest[i]=src[i];
   6303 }
   6304 
   6305 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
   6306 {
   6307    int act_comp=0,num_packets=0,y,chained;
   6308    stbi__pic_packet packets[10];
   6309 
   6310    // this will (should...) cater for even some bizarre stuff like having data
   6311     // for the same channel in multiple packets.
   6312    do {
   6313       stbi__pic_packet *packet;
   6314 
   6315       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   6316          return stbi__errpuc("bad format","too many packets");
   6317 
   6318       packet = &packets[num_packets++];
   6319 
   6320       chained = stbi__get8(s);
   6321       packet->size    = stbi__get8(s);
   6322       packet->type    = stbi__get8(s);
   6323       packet->channel = stbi__get8(s);
   6324 
   6325       act_comp |= packet->channel;
   6326 
   6327       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
   6328       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
   6329    } while (chained);
   6330 
   6331    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
   6332 
   6333    for(y=0; y<height; ++y) {
   6334       int packet_idx;
   6335 
   6336       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
   6337          stbi__pic_packet *packet = &packets[packet_idx];
   6338          stbi_uc *dest = result+y*width*4;
   6339 
   6340          switch (packet->type) {
   6341             default:
   6342                return stbi__errpuc("bad format","packet has bad compression type");
   6343 
   6344             case 0: {//uncompressed
   6345                int x;
   6346 
   6347                for(x=0;x<width;++x, dest+=4)
   6348                   if (!stbi__readval(s,packet->channel,dest))
   6349                      return 0;
   6350                break;
   6351             }
   6352 
   6353             case 1://Pure RLE
   6354                {
   6355                   int left=width, i;
   6356 
   6357                   while (left>0) {
   6358                      stbi_uc count,value[4];
   6359 
   6360                      count=stbi__get8(s);
   6361                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
   6362 
   6363                      if (count > left)
   6364                         count = (stbi_uc) left;
   6365 
   6366                      if (!stbi__readval(s,packet->channel,value))  return 0;
   6367 
   6368                      for(i=0; i<count; ++i,dest+=4)
   6369                         stbi__copyval(packet->channel,dest,value);
   6370                      left -= count;
   6371                   }
   6372                }
   6373                break;
   6374 
   6375             case 2: {//Mixed RLE
   6376                int left=width;
   6377                while (left>0) {
   6378                   int count = stbi__get8(s), i;
   6379                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
   6380 
   6381                   if (count >= 128) { // Repeated
   6382                      stbi_uc value[4];
   6383 
   6384                      if (count==128)
   6385                         count = stbi__get16be(s);
   6386                      else
   6387                         count -= 127;
   6388                      if (count > left)
   6389                         return stbi__errpuc("bad file","scanline overrun");
   6390 
   6391                      if (!stbi__readval(s,packet->channel,value))
   6392                         return 0;
   6393 
   6394                      for(i=0;i<count;++i, dest += 4)
   6395                         stbi__copyval(packet->channel,dest,value);
   6396                   } else { // Raw
   6397                      ++count;
   6398                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
   6399 
   6400                      for(i=0;i<count;++i, dest+=4)
   6401                         if (!stbi__readval(s,packet->channel,dest))
   6402                            return 0;
   6403                   }
   6404                   left-=count;
   6405                }
   6406                break;
   6407             }
   6408          }
   6409       }
   6410    }
   6411 
   6412    return result;
   6413 }
   6414 
   6415 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
   6416 {
   6417    stbi_uc *result;
   6418    int i, x,y, internal_comp;
   6419    STBI_NOTUSED(ri);
   6420 
   6421    if (!comp) comp = &internal_comp;
   6422 
   6423    for (i=0; i<92; ++i)
   6424       stbi__get8(s);
   6425 
   6426    x = stbi__get16be(s);
   6427    y = stbi__get16be(s);
   6428 
   6429    if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6430    if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6431 
   6432    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
   6433    if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
   6434 
   6435    stbi__get32be(s); //skip `ratio'
   6436    stbi__get16be(s); //skip `fields'
   6437    stbi__get16be(s); //skip `pad'
   6438 
   6439    // intermediate buffer is RGBA
   6440    result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
   6441    if (!result) return stbi__errpuc("outofmem", "Out of memory");
   6442    memset(result, 0xff, x*y*4);
   6443 
   6444    if (!stbi__pic_load_core(s,x,y,comp, result)) {
   6445       STBI_FREE(result);
   6446       result=0;
   6447    }
   6448    *px = x;
   6449    *py = y;
   6450    if (req_comp == 0) req_comp = *comp;
   6451    result=stbi__convert_format(result,4,req_comp,x,y);
   6452 
   6453    return result;
   6454 }
   6455 
   6456 static int stbi__pic_test(stbi__context *s)
   6457 {
   6458    int r = stbi__pic_test_core(s);
   6459    stbi__rewind(s);
   6460    return r;
   6461 }
   6462 #endif
   6463 
   6464 // *************************************************************************************************
   6465 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
   6466 
   6467 #ifndef STBI_NO_GIF
   6468 typedef struct
   6469 {
   6470    stbi__int16 prefix;
   6471    stbi_uc first;
   6472    stbi_uc suffix;
   6473 } stbi__gif_lzw;
   6474 
   6475 typedef struct
   6476 {
   6477    int w,h;
   6478    stbi_uc *out;                 // output buffer (always 4 components)
   6479    stbi_uc *background;          // The current "background" as far as a gif is concerned
   6480    stbi_uc *history;
   6481    int flags, bgindex, ratio, transparent, eflags;
   6482    stbi_uc  pal[256][4];
   6483    stbi_uc lpal[256][4];
   6484    stbi__gif_lzw codes[8192];
   6485    stbi_uc *color_table;
   6486    int parse, step;
   6487    int lflags;
   6488    int start_x, start_y;
   6489    int max_x, max_y;
   6490    int cur_x, cur_y;
   6491    int line_size;
   6492    int delay;
   6493 } stbi__gif;
   6494 
   6495 static int stbi__gif_test_raw(stbi__context *s)
   6496 {
   6497    int sz;
   6498    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
   6499    sz = stbi__get8(s);
   6500    if (sz != '9' && sz != '7') return 0;
   6501    if (stbi__get8(s) != 'a') return 0;
   6502    return 1;
   6503 }
   6504 
   6505 static int stbi__gif_test(stbi__context *s)
   6506 {
   6507    int r = stbi__gif_test_raw(s);
   6508    stbi__rewind(s);
   6509    return r;
   6510 }
   6511 
   6512 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
   6513 {
   6514    int i;
   6515    for (i=0; i < num_entries; ++i) {
   6516       pal[i][2] = stbi__get8(s);
   6517       pal[i][1] = stbi__get8(s);
   6518       pal[i][0] = stbi__get8(s);
   6519       pal[i][3] = transp == i ? 0 : 255;
   6520    }
   6521 }
   6522 
   6523 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
   6524 {
   6525    stbi_uc version;
   6526    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
   6527       return stbi__err("not GIF", "Corrupt GIF");
   6528 
   6529    version = stbi__get8(s);
   6530    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
   6531    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
   6532 
   6533    stbi__g_failure_reason = "";
   6534    g->w = stbi__get16le(s);
   6535    g->h = stbi__get16le(s);
   6536    g->flags = stbi__get8(s);
   6537    g->bgindex = stbi__get8(s);
   6538    g->ratio = stbi__get8(s);
   6539    g->transparent = -1;
   6540 
   6541    if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   6542    if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   6543 
   6544    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
   6545 
   6546    if (is_info) return 1;
   6547 
   6548    if (g->flags & 0x80)
   6549       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
   6550 
   6551    return 1;
   6552 }
   6553 
   6554 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
   6555 {
   6556    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
   6557    if (!g) return stbi__err("outofmem", "Out of memory");
   6558    if (!stbi__gif_header(s, g, comp, 1)) {
   6559       STBI_FREE(g);
   6560       stbi__rewind( s );
   6561       return 0;
   6562    }
   6563    if (x) *x = g->w;
   6564    if (y) *y = g->h;
   6565    STBI_FREE(g);
   6566    return 1;
   6567 }
   6568 
   6569 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
   6570 {
   6571    stbi_uc *p, *c;
   6572    int idx;
   6573 
   6574    // recurse to decode the prefixes, since the linked-list is backwards,
   6575    // and working backwards through an interleaved image would be nasty
   6576    if (g->codes[code].prefix >= 0)
   6577       stbi__out_gif_code(g, g->codes[code].prefix);
   6578 
   6579    if (g->cur_y >= g->max_y) return;
   6580 
   6581    idx = g->cur_x + g->cur_y;
   6582    p = &g->out[idx];
   6583    g->history[idx / 4] = 1;
   6584 
   6585    c = &g->color_table[g->codes[code].suffix * 4];
   6586    if (c[3] > 128) { // don't render transparent pixels;
   6587       p[0] = c[2];
   6588       p[1] = c[1];
   6589       p[2] = c[0];
   6590       p[3] = c[3];
   6591    }
   6592    g->cur_x += 4;
   6593 
   6594    if (g->cur_x >= g->max_x) {
   6595       g->cur_x = g->start_x;
   6596       g->cur_y += g->step;
   6597 
   6598       while (g->cur_y >= g->max_y && g->parse > 0) {
   6599          g->step = (1 << g->parse) * g->line_size;
   6600          g->cur_y = g->start_y + (g->step >> 1);
   6601          --g->parse;
   6602       }
   6603    }
   6604 }
   6605 
   6606 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
   6607 {
   6608    stbi_uc lzw_cs;
   6609    stbi__int32 len, init_code;
   6610    stbi__uint32 first;
   6611    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
   6612    stbi__gif_lzw *p;
   6613 
   6614    lzw_cs = stbi__get8(s);
   6615    if (lzw_cs > 12) return NULL;
   6616    clear = 1 << lzw_cs;
   6617    first = 1;
   6618    codesize = lzw_cs + 1;
   6619    codemask = (1 << codesize) - 1;
   6620    bits = 0;
   6621    valid_bits = 0;
   6622    for (init_code = 0; init_code < clear; init_code++) {
   6623       g->codes[init_code].prefix = -1;
   6624       g->codes[init_code].first = (stbi_uc) init_code;
   6625       g->codes[init_code].suffix = (stbi_uc) init_code;
   6626    }
   6627 
   6628    // support no starting clear code
   6629    avail = clear+2;
   6630    oldcode = -1;
   6631 
   6632    len = 0;
   6633    for(;;) {
   6634       if (valid_bits < codesize) {
   6635          if (len == 0) {
   6636             len = stbi__get8(s); // start new block
   6637             if (len == 0)
   6638                return g->out;
   6639          }
   6640          --len;
   6641          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
   6642          valid_bits += 8;
   6643       } else {
   6644          stbi__int32 code = bits & codemask;
   6645          bits >>= codesize;
   6646          valid_bits -= codesize;
   6647          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
   6648          if (code == clear) {  // clear code
   6649             codesize = lzw_cs + 1;
   6650             codemask = (1 << codesize) - 1;
   6651             avail = clear + 2;
   6652             oldcode = -1;
   6653             first = 0;
   6654          } else if (code == clear + 1) { // end of stream code
   6655             stbi__skip(s, len);
   6656             while ((len = stbi__get8(s)) > 0)
   6657                stbi__skip(s,len);
   6658             return g->out;
   6659          } else if (code <= avail) {
   6660             if (first) {
   6661                return stbi__errpuc("no clear code", "Corrupt GIF");
   6662             }
   6663 
   6664             if (oldcode >= 0) {
   6665                p = &g->codes[avail++];
   6666                if (avail > 8192) {
   6667                   return stbi__errpuc("too many codes", "Corrupt GIF");
   6668                }
   6669 
   6670                p->prefix = (stbi__int16) oldcode;
   6671                p->first = g->codes[oldcode].first;
   6672                p->suffix = (code == avail) ? p->first : g->codes[code].first;
   6673             } else if (code == avail)
   6674                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6675 
   6676             stbi__out_gif_code(g, (stbi__uint16) code);
   6677 
   6678             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
   6679                codesize++;
   6680                codemask = (1 << codesize) - 1;
   6681             }
   6682 
   6683             oldcode = code;
   6684          } else {
   6685             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6686          }
   6687       }
   6688    }
   6689 }
   6690 
   6691 // this function is designed to support animated gifs, although stb_image doesn't support it
   6692 // two back is the image from two frames ago, used for a very specific disposal format
   6693 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
   6694 {
   6695    int dispose;
   6696    int first_frame;
   6697    int pi;
   6698    int pcount;
   6699    STBI_NOTUSED(req_comp);
   6700 
   6701    // on first frame, any non-written pixels get the background colour (non-transparent)
   6702    first_frame = 0;
   6703    if (g->out == 0) {
   6704       if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
   6705       if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
   6706          return stbi__errpuc("too large", "GIF image is too large");
   6707       pcount = g->w * g->h;
   6708       g->out = (stbi_uc *) stbi__malloc(4 * pcount);
   6709       g->background = (stbi_uc *) stbi__malloc(4 * pcount);
   6710       g->history = (stbi_uc *) stbi__malloc(pcount);
   6711       if (!g->out || !g->background || !g->history)
   6712          return stbi__errpuc("outofmem", "Out of memory");
   6713 
   6714       // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
   6715       // background colour is only used for pixels that are not rendered first frame, after that "background"
   6716       // color refers to the color that was there the previous frame.
   6717       memset(g->out, 0x00, 4 * pcount);
   6718       memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
   6719       memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
   6720       first_frame = 1;
   6721    } else {
   6722       // second frame - how do we dispose of the previous one?
   6723       dispose = (g->eflags & 0x1C) >> 2;
   6724       pcount = g->w * g->h;
   6725 
   6726       if ((dispose == 3) && (two_back == 0)) {
   6727          dispose = 2; // if I don't have an image to revert back to, default to the old background
   6728       }
   6729 
   6730       if (dispose == 3) { // use previous graphic
   6731          for (pi = 0; pi < pcount; ++pi) {
   6732             if (g->history[pi]) {
   6733                memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
   6734             }
   6735          }
   6736       } else if (dispose == 2) {
   6737          // restore what was changed last frame to background before that frame;
   6738          for (pi = 0; pi < pcount; ++pi) {
   6739             if (g->history[pi]) {
   6740                memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
   6741             }
   6742          }
   6743       } else {
   6744          // This is a non-disposal case eithe way, so just
   6745          // leave the pixels as is, and they will become the new background
   6746          // 1: do not dispose
   6747          // 0:  not specified.
   6748       }
   6749 
   6750       // background is what out is after the undoing of the previou frame;
   6751       memcpy( g->background, g->out, 4 * g->w * g->h );
   6752    }
   6753 
   6754    // clear my history;
   6755    memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
   6756 
   6757    for (;;) {
   6758       int tag = stbi__get8(s);
   6759       switch (tag) {
   6760          case 0x2C: /* Image Descriptor */
   6761          {
   6762             stbi__int32 x, y, w, h;
   6763             stbi_uc *o;
   6764 
   6765             x = stbi__get16le(s);
   6766             y = stbi__get16le(s);
   6767             w = stbi__get16le(s);
   6768             h = stbi__get16le(s);
   6769             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
   6770                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
   6771 
   6772             g->line_size = g->w * 4;
   6773             g->start_x = x * 4;
   6774             g->start_y = y * g->line_size;
   6775             g->max_x   = g->start_x + w * 4;
   6776             g->max_y   = g->start_y + h * g->line_size;
   6777             g->cur_x   = g->start_x;
   6778             g->cur_y   = g->start_y;
   6779 
   6780             // if the width of the specified rectangle is 0, that means
   6781             // we may not see *any* pixels or the image is malformed;
   6782             // to make sure this is caught, move the current y down to
   6783             // max_y (which is what out_gif_code checks).
   6784             if (w == 0)
   6785                g->cur_y = g->max_y;
   6786 
   6787             g->lflags = stbi__get8(s);
   6788 
   6789             if (g->lflags & 0x40) {
   6790                g->step = 8 * g->line_size; // first interlaced spacing
   6791                g->parse = 3;
   6792             } else {
   6793                g->step = g->line_size;
   6794                g->parse = 0;
   6795             }
   6796 
   6797             if (g->lflags & 0x80) {
   6798                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
   6799                g->color_table = (stbi_uc *) g->lpal;
   6800             } else if (g->flags & 0x80) {
   6801                g->color_table = (stbi_uc *) g->pal;
   6802             } else
   6803                return stbi__errpuc("missing color table", "Corrupt GIF");
   6804 
   6805             o = stbi__process_gif_raster(s, g);
   6806             if (!o) return NULL;
   6807 
   6808             // if this was the first frame,
   6809             pcount = g->w * g->h;
   6810             if (first_frame && (g->bgindex > 0)) {
   6811                // if first frame, any pixel not drawn to gets the background color
   6812                for (pi = 0; pi < pcount; ++pi) {
   6813                   if (g->history[pi] == 0) {
   6814                      g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
   6815                      memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
   6816                   }
   6817                }
   6818             }
   6819 
   6820             return o;
   6821          }
   6822 
   6823          case 0x21: // Comment Extension.
   6824          {
   6825             int len;
   6826             int ext = stbi__get8(s);
   6827             if (ext == 0xF9) { // Graphic Control Extension.
   6828                len = stbi__get8(s);
   6829                if (len == 4) {
   6830                   g->eflags = stbi__get8(s);
   6831                   g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
   6832 
   6833                   // unset old transparent
   6834                   if (g->transparent >= 0) {
   6835                      g->pal[g->transparent][3] = 255;
   6836                   }
   6837                   if (g->eflags & 0x01) {
   6838                      g->transparent = stbi__get8(s);
   6839                      if (g->transparent >= 0) {
   6840                         g->pal[g->transparent][3] = 0;
   6841                      }
   6842                   } else {
   6843                      // don't need transparent
   6844                      stbi__skip(s, 1);
   6845                      g->transparent = -1;
   6846                   }
   6847                } else {
   6848                   stbi__skip(s, len);
   6849                   break;
   6850                }
   6851             }
   6852             while ((len = stbi__get8(s)) != 0) {
   6853                stbi__skip(s, len);
   6854             }
   6855             break;
   6856          }
   6857 
   6858          case 0x3B: // gif stream termination code
   6859             return (stbi_uc *) s; // using '1' causes warning on some compilers
   6860 
   6861          default:
   6862             return stbi__errpuc("unknown code", "Corrupt GIF");
   6863       }
   6864    }
   6865 }
   6866 
   6867 static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
   6868 {
   6869    STBI_FREE(g->out);
   6870    STBI_FREE(g->history);
   6871    STBI_FREE(g->background);
   6872 
   6873    if (out) STBI_FREE(out);
   6874    if (delays && *delays) STBI_FREE(*delays);
   6875    return stbi__errpuc("outofmem", "Out of memory");
   6876 }
   6877 
   6878 static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   6879 {
   6880    if (stbi__gif_test(s)) {
   6881       int layers = 0;
   6882       stbi_uc *u = 0;
   6883       stbi_uc *out = 0;
   6884       stbi_uc *two_back = 0;
   6885       stbi__gif g;
   6886       int stride;
   6887       int out_size = 0;
   6888       int delays_size = 0;
   6889 
   6890       STBI_NOTUSED(out_size);
   6891       STBI_NOTUSED(delays_size);
   6892 
   6893       memset(&g, 0, sizeof(g));
   6894       if (delays) {
   6895          *delays = 0;
   6896       }
   6897 
   6898       do {
   6899          u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
   6900          if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   6901 
   6902          if (u) {
   6903             *x = g.w;
   6904             *y = g.h;
   6905             ++layers;
   6906             stride = g.w * g.h * 4;
   6907 
   6908             if (out) {
   6909                void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
   6910                if (!tmp)
   6911                   return stbi__load_gif_main_outofmem(&g, out, delays);
   6912                else {
   6913                    out = (stbi_uc*) tmp;
   6914                    out_size = layers * stride;
   6915                }
   6916 
   6917                if (delays) {
   6918                   int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
   6919                   if (!new_delays)
   6920                      return stbi__load_gif_main_outofmem(&g, out, delays);
   6921                   *delays = new_delays;
   6922                   delays_size = layers * sizeof(int);
   6923                }
   6924             } else {
   6925                out = (stbi_uc*)stbi__malloc( layers * stride );
   6926                if (!out)
   6927                   return stbi__load_gif_main_outofmem(&g, out, delays);
   6928                out_size = layers * stride;
   6929                if (delays) {
   6930                   *delays = (int*) stbi__malloc( layers * sizeof(int) );
   6931                   if (!*delays)
   6932                      return stbi__load_gif_main_outofmem(&g, out, delays);
   6933                   delays_size = layers * sizeof(int);
   6934                }
   6935             }
   6936             memcpy( out + ((layers - 1) * stride), u, stride );
   6937             if (layers >= 2) {
   6938                two_back = out - 2 * stride;
   6939             }
   6940 
   6941             if (delays) {
   6942                (*delays)[layers - 1U] = g.delay;
   6943             }
   6944          }
   6945       } while (u != 0);
   6946 
   6947       // free temp buffer;
   6948       STBI_FREE(g.out);
   6949       STBI_FREE(g.history);
   6950       STBI_FREE(g.background);
   6951 
   6952       // do the final conversion after loading everything;
   6953       if (req_comp && req_comp != 4)
   6954          out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
   6955 
   6956       *z = layers;
   6957       return out;
   6958    } else {
   6959       return stbi__errpuc("not GIF", "Image was not as a gif type.");
   6960    }
   6961 }
   6962 
   6963 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   6964 {
   6965    stbi_uc *u = 0;
   6966    stbi__gif g;
   6967    memset(&g, 0, sizeof(g));
   6968    STBI_NOTUSED(ri);
   6969 
   6970    u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
   6971    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   6972    if (u) {
   6973       *x = g.w;
   6974       *y = g.h;
   6975 
   6976       // moved conversion to after successful load so that the same
   6977       // can be done for multiple frames.
   6978       if (req_comp && req_comp != 4)
   6979          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
   6980    } else if (g.out) {
   6981       // if there was an error and we allocated an image buffer, free it!
   6982       STBI_FREE(g.out);
   6983    }
   6984 
   6985    // free buffers needed for multiple frame loading;
   6986    STBI_FREE(g.history);
   6987    STBI_FREE(g.background);
   6988 
   6989    return u;
   6990 }
   6991 
   6992 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
   6993 {
   6994    return stbi__gif_info_raw(s,x,y,comp);
   6995 }
   6996 #endif
   6997 
   6998 // *************************************************************************************************
   6999 // Radiance RGBE HDR loader
   7000 // originally by Nicolas Schulz
   7001 #ifndef STBI_NO_HDR
   7002 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
   7003 {
   7004    int i;
   7005    for (i=0; signature[i]; ++i)
   7006       if (stbi__get8(s) != signature[i])
   7007           return 0;
   7008    stbi__rewind(s);
   7009    return 1;
   7010 }
   7011 
   7012 static int stbi__hdr_test(stbi__context* s)
   7013 {
   7014    int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
   7015    stbi__rewind(s);
   7016    if(!r) {
   7017        r = stbi__hdr_test_core(s, "#?RGBE\n");
   7018        stbi__rewind(s);
   7019    }
   7020    return r;
   7021 }
   7022 
   7023 #define STBI__HDR_BUFLEN  1024
   7024 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
   7025 {
   7026    int len=0;
   7027    char c = '\0';
   7028 
   7029    c = (char) stbi__get8(z);
   7030 
   7031    while (!stbi__at_eof(z) && c != '\n') {
   7032       buffer[len++] = c;
   7033       if (len == STBI__HDR_BUFLEN-1) {
   7034          // flush to end of line
   7035          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
   7036             ;
   7037          break;
   7038       }
   7039       c = (char) stbi__get8(z);
   7040    }
   7041 
   7042    buffer[len] = 0;
   7043    return buffer;
   7044 }
   7045 
   7046 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
   7047 {
   7048    if ( input[3] != 0 ) {
   7049       float f1;
   7050       // Exponent
   7051       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
   7052       if (req_comp <= 2)
   7053          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
   7054       else {
   7055          output[0] = input[0] * f1;
   7056          output[1] = input[1] * f1;
   7057          output[2] = input[2] * f1;
   7058       }
   7059       if (req_comp == 2) output[1] = 1;
   7060       if (req_comp == 4) output[3] = 1;
   7061    } else {
   7062       switch (req_comp) {
   7063          case 4: output[3] = 1; /* fallthrough */
   7064          case 3: output[0] = output[1] = output[2] = 0;
   7065                  break;
   7066          case 2: output[1] = 1; /* fallthrough */
   7067          case 1: output[0] = 0;
   7068                  break;
   7069       }
   7070    }
   7071 }
   7072 
   7073 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7074 {
   7075    char buffer[STBI__HDR_BUFLEN];
   7076    char *token;
   7077    int valid = 0;
   7078    int width, height;
   7079    stbi_uc *scanline;
   7080    float *hdr_data;
   7081    int len;
   7082    unsigned char count, value;
   7083    int i, j, k, c1,c2, z;
   7084    const char *headerToken;
   7085    STBI_NOTUSED(ri);
   7086 
   7087    // Check identifier
   7088    headerToken = stbi__hdr_gettoken(s,buffer);
   7089    if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
   7090       return stbi__errpf("not HDR", "Corrupt HDR image");
   7091 
   7092    // Parse header
   7093    for(;;) {
   7094       token = stbi__hdr_gettoken(s,buffer);
   7095       if (token[0] == 0) break;
   7096       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   7097    }
   7098 
   7099    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
   7100 
   7101    // Parse width and height
   7102    // can't use sscanf() if we're not using stdio!
   7103    token = stbi__hdr_gettoken(s,buffer);
   7104    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   7105    token += 3;
   7106    height = (int) strtol(token, &token, 10);
   7107    while (*token == ' ') ++token;
   7108    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   7109    token += 3;
   7110    width = (int) strtol(token, NULL, 10);
   7111 
   7112    if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
   7113    if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
   7114 
   7115    *x = width;
   7116    *y = height;
   7117 
   7118    if (comp) *comp = 3;
   7119    if (req_comp == 0) req_comp = 3;
   7120 
   7121    if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
   7122       return stbi__errpf("too large", "HDR image is too large");
   7123 
   7124    // Read data
   7125    hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
   7126    if (!hdr_data)
   7127       return stbi__errpf("outofmem", "Out of memory");
   7128 
   7129    // Load image data
   7130    // image data is stored as some number of sca
   7131    if ( width < 8 || width >= 32768) {
   7132       // Read flat data
   7133       for (j=0; j < height; ++j) {
   7134          for (i=0; i < width; ++i) {
   7135             stbi_uc rgbe[4];
   7136            main_decode_loop:
   7137             stbi__getn(s, rgbe, 4);
   7138             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
   7139          }
   7140       }
   7141    } else {
   7142       // Read RLE-encoded data
   7143       scanline = NULL;
   7144 
   7145       for (j = 0; j < height; ++j) {
   7146          c1 = stbi__get8(s);
   7147          c2 = stbi__get8(s);
   7148          len = stbi__get8(s);
   7149          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
   7150             // not run-length encoded, so we have to actually use THIS data as a decoded
   7151             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
   7152             stbi_uc rgbe[4];
   7153             rgbe[0] = (stbi_uc) c1;
   7154             rgbe[1] = (stbi_uc) c2;
   7155             rgbe[2] = (stbi_uc) len;
   7156             rgbe[3] = (stbi_uc) stbi__get8(s);
   7157             stbi__hdr_convert(hdr_data, rgbe, req_comp);
   7158             i = 1;
   7159             j = 0;
   7160             STBI_FREE(scanline);
   7161             goto main_decode_loop; // yes, this makes no sense
   7162          }
   7163          len <<= 8;
   7164          len |= stbi__get8(s);
   7165          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
   7166          if (scanline == NULL) {
   7167             scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
   7168             if (!scanline) {
   7169                STBI_FREE(hdr_data);
   7170                return stbi__errpf("outofmem", "Out of memory");
   7171             }
   7172          }
   7173 
   7174          for (k = 0; k < 4; ++k) {
   7175             int nleft;
   7176             i = 0;
   7177             while ((nleft = width - i) > 0) {
   7178                count = stbi__get8(s);
   7179                if (count > 128) {
   7180                   // Run
   7181                   value = stbi__get8(s);
   7182                   count -= 128;
   7183                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   7184                   for (z = 0; z < count; ++z)
   7185                      scanline[i++ * 4 + k] = value;
   7186                } else {
   7187                   // Dump
   7188                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   7189                   for (z = 0; z < count; ++z)
   7190                      scanline[i++ * 4 + k] = stbi__get8(s);
   7191                }
   7192             }
   7193          }
   7194          for (i=0; i < width; ++i)
   7195             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
   7196       }
   7197       if (scanline)
   7198          STBI_FREE(scanline);
   7199    }
   7200 
   7201    return hdr_data;
   7202 }
   7203 
   7204 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
   7205 {
   7206    char buffer[STBI__HDR_BUFLEN];
   7207    char *token;
   7208    int valid = 0;
   7209    int dummy;
   7210 
   7211    if (!x) x = &dummy;
   7212    if (!y) y = &dummy;
   7213    if (!comp) comp = &dummy;
   7214 
   7215    if (stbi__hdr_test(s) == 0) {
   7216        stbi__rewind( s );
   7217        return 0;
   7218    }
   7219 
   7220    for(;;) {
   7221       token = stbi__hdr_gettoken(s,buffer);
   7222       if (token[0] == 0) break;
   7223       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   7224    }
   7225 
   7226    if (!valid) {
   7227        stbi__rewind( s );
   7228        return 0;
   7229    }
   7230    token = stbi__hdr_gettoken(s,buffer);
   7231    if (strncmp(token, "-Y ", 3)) {
   7232        stbi__rewind( s );
   7233        return 0;
   7234    }
   7235    token += 3;
   7236    *y = (int) strtol(token, &token, 10);
   7237    while (*token == ' ') ++token;
   7238    if (strncmp(token, "+X ", 3)) {
   7239        stbi__rewind( s );
   7240        return 0;
   7241    }
   7242    token += 3;
   7243    *x = (int) strtol(token, NULL, 10);
   7244    *comp = 3;
   7245    return 1;
   7246 }
   7247 #endif // STBI_NO_HDR
   7248 
   7249 #ifndef STBI_NO_BMP
   7250 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
   7251 {
   7252    void *p;
   7253    stbi__bmp_data info;
   7254 
   7255    info.all_a = 255;
   7256    p = stbi__bmp_parse_header(s, &info);
   7257    if (p == NULL) {
   7258       stbi__rewind( s );
   7259       return 0;
   7260    }
   7261    if (x) *x = s->img_x;
   7262    if (y) *y = s->img_y;
   7263    if (comp) {
   7264       if (info.bpp == 24 && info.ma == 0xff000000)
   7265          *comp = 3;
   7266       else
   7267          *comp = info.ma ? 4 : 3;
   7268    }
   7269    return 1;
   7270 }
   7271 #endif
   7272 
   7273 #ifndef STBI_NO_PSD
   7274 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
   7275 {
   7276    int channelCount, dummy, depth;
   7277    if (!x) x = &dummy;
   7278    if (!y) y = &dummy;
   7279    if (!comp) comp = &dummy;
   7280    if (stbi__get32be(s) != 0x38425053) {
   7281        stbi__rewind( s );
   7282        return 0;
   7283    }
   7284    if (stbi__get16be(s) != 1) {
   7285        stbi__rewind( s );
   7286        return 0;
   7287    }
   7288    stbi__skip(s, 6);
   7289    channelCount = stbi__get16be(s);
   7290    if (channelCount < 0 || channelCount > 16) {
   7291        stbi__rewind( s );
   7292        return 0;
   7293    }
   7294    *y = stbi__get32be(s);
   7295    *x = stbi__get32be(s);
   7296    depth = stbi__get16be(s);
   7297    if (depth != 8 && depth != 16) {
   7298        stbi__rewind( s );
   7299        return 0;
   7300    }
   7301    if (stbi__get16be(s) != 3) {
   7302        stbi__rewind( s );
   7303        return 0;
   7304    }
   7305    *comp = 4;
   7306    return 1;
   7307 }
   7308 
   7309 static int stbi__psd_is16(stbi__context *s)
   7310 {
   7311    int channelCount, depth;
   7312    if (stbi__get32be(s) != 0x38425053) {
   7313        stbi__rewind( s );
   7314        return 0;
   7315    }
   7316    if (stbi__get16be(s) != 1) {
   7317        stbi__rewind( s );
   7318        return 0;
   7319    }
   7320    stbi__skip(s, 6);
   7321    channelCount = stbi__get16be(s);
   7322    if (channelCount < 0 || channelCount > 16) {
   7323        stbi__rewind( s );
   7324        return 0;
   7325    }
   7326    STBI_NOTUSED(stbi__get32be(s));
   7327    STBI_NOTUSED(stbi__get32be(s));
   7328    depth = stbi__get16be(s);
   7329    if (depth != 16) {
   7330        stbi__rewind( s );
   7331        return 0;
   7332    }
   7333    return 1;
   7334 }
   7335 #endif
   7336 
   7337 #ifndef STBI_NO_PIC
   7338 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
   7339 {
   7340    int act_comp=0,num_packets=0,chained,dummy;
   7341    stbi__pic_packet packets[10];
   7342 
   7343    if (!x) x = &dummy;
   7344    if (!y) y = &dummy;
   7345    if (!comp) comp = &dummy;
   7346 
   7347    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
   7348       stbi__rewind(s);
   7349       return 0;
   7350    }
   7351 
   7352    stbi__skip(s, 88);
   7353 
   7354    *x = stbi__get16be(s);
   7355    *y = stbi__get16be(s);
   7356    if (stbi__at_eof(s)) {
   7357       stbi__rewind( s);
   7358       return 0;
   7359    }
   7360    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
   7361       stbi__rewind( s );
   7362       return 0;
   7363    }
   7364 
   7365    stbi__skip(s, 8);
   7366 
   7367    do {
   7368       stbi__pic_packet *packet;
   7369 
   7370       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   7371          return 0;
   7372 
   7373       packet = &packets[num_packets++];
   7374       chained = stbi__get8(s);
   7375       packet->size    = stbi__get8(s);
   7376       packet->type    = stbi__get8(s);
   7377       packet->channel = stbi__get8(s);
   7378       act_comp |= packet->channel;
   7379 
   7380       if (stbi__at_eof(s)) {
   7381           stbi__rewind( s );
   7382           return 0;
   7383       }
   7384       if (packet->size != 8) {
   7385           stbi__rewind( s );
   7386           return 0;
   7387       }
   7388    } while (chained);
   7389 
   7390    *comp = (act_comp & 0x10 ? 4 : 3);
   7391 
   7392    return 1;
   7393 }
   7394 #endif
   7395 
   7396 // *************************************************************************************************
   7397 // Portable Gray Map and Portable Pixel Map loader
   7398 // by Ken Miller
   7399 //
   7400 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
   7401 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
   7402 //
   7403 // Known limitations:
   7404 //    Does not support comments in the header section
   7405 //    Does not support ASCII image data (formats P2 and P3)
   7406 
   7407 #ifndef STBI_NO_PNM
   7408 
   7409 static int      stbi__pnm_test(stbi__context *s)
   7410 {
   7411    char p, t;
   7412    p = (char) stbi__get8(s);
   7413    t = (char) stbi__get8(s);
   7414    if (p != 'P' || (t != '5' && t != '6')) {
   7415        stbi__rewind( s );
   7416        return 0;
   7417    }
   7418    return 1;
   7419 }
   7420 
   7421 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7422 {
   7423    stbi_uc *out;
   7424    STBI_NOTUSED(ri);
   7425 
   7426    ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
   7427    if (ri->bits_per_channel == 0)
   7428       return 0;
   7429 
   7430    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   7431    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   7432 
   7433    *x = s->img_x;
   7434    *y = s->img_y;
   7435    if (comp) *comp = s->img_n;
   7436 
   7437    if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
   7438       return stbi__errpuc("too large", "PNM too large");
   7439 
   7440    out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
   7441    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   7442    stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8));
   7443 
   7444    if (req_comp && req_comp != s->img_n) {
   7445       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
   7446       if (out == NULL) return out; // stbi__convert_format frees input on failure
   7447    }
   7448    return out;
   7449 }
   7450 
   7451 static int      stbi__pnm_isspace(char c)
   7452 {
   7453    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
   7454 }
   7455 
   7456 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
   7457 {
   7458    for (;;) {
   7459       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
   7460          *c = (char) stbi__get8(s);
   7461 
   7462       if (stbi__at_eof(s) || *c != '#')
   7463          break;
   7464 
   7465       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
   7466          *c = (char) stbi__get8(s);
   7467    }
   7468 }
   7469 
   7470 static int      stbi__pnm_isdigit(char c)
   7471 {
   7472    return c >= '0' && c <= '9';
   7473 }
   7474 
   7475 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
   7476 {
   7477    int value = 0;
   7478 
   7479    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
   7480       value = value*10 + (*c - '0');
   7481       *c = (char) stbi__get8(s);
   7482    }
   7483 
   7484    return value;
   7485 }
   7486 
   7487 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
   7488 {
   7489    int maxv, dummy;
   7490    char c, p, t;
   7491 
   7492    if (!x) x = &dummy;
   7493    if (!y) y = &dummy;
   7494    if (!comp) comp = &dummy;
   7495 
   7496    stbi__rewind(s);
   7497 
   7498    // Get identifier
   7499    p = (char) stbi__get8(s);
   7500    t = (char) stbi__get8(s);
   7501    if (p != 'P' || (t != '5' && t != '6')) {
   7502        stbi__rewind(s);
   7503        return 0;
   7504    }
   7505 
   7506    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
   7507 
   7508    c = (char) stbi__get8(s);
   7509    stbi__pnm_skip_whitespace(s, &c);
   7510 
   7511    *x = stbi__pnm_getinteger(s, &c); // read width
   7512    stbi__pnm_skip_whitespace(s, &c);
   7513 
   7514    *y = stbi__pnm_getinteger(s, &c); // read height
   7515    stbi__pnm_skip_whitespace(s, &c);
   7516 
   7517    maxv = stbi__pnm_getinteger(s, &c);  // read max value
   7518    if (maxv > 65535)
   7519       return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
   7520    else if (maxv > 255)
   7521       return 16;
   7522    else
   7523       return 8;
   7524 }
   7525 
   7526 static int stbi__pnm_is16(stbi__context *s)
   7527 {
   7528    if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
   7529 	   return 1;
   7530    return 0;
   7531 }
   7532 #endif
   7533 
   7534 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
   7535 {
   7536    #ifndef STBI_NO_JPEG
   7537    if (stbi__jpeg_info(s, x, y, comp)) return 1;
   7538    #endif
   7539 
   7540    #ifndef STBI_NO_PNG
   7541    if (stbi__png_info(s, x, y, comp))  return 1;
   7542    #endif
   7543 
   7544    #ifndef STBI_NO_GIF
   7545    if (stbi__gif_info(s, x, y, comp))  return 1;
   7546    #endif
   7547 
   7548    #ifndef STBI_NO_BMP
   7549    if (stbi__bmp_info(s, x, y, comp))  return 1;
   7550    #endif
   7551 
   7552    #ifndef STBI_NO_PSD
   7553    if (stbi__psd_info(s, x, y, comp))  return 1;
   7554    #endif
   7555 
   7556    #ifndef STBI_NO_PIC
   7557    if (stbi__pic_info(s, x, y, comp))  return 1;
   7558    #endif
   7559 
   7560    #ifndef STBI_NO_PNM
   7561    if (stbi__pnm_info(s, x, y, comp))  return 1;
   7562    #endif
   7563 
   7564    #ifndef STBI_NO_HDR
   7565    if (stbi__hdr_info(s, x, y, comp))  return 1;
   7566    #endif
   7567 
   7568    // test tga last because it's a crappy test!
   7569    #ifndef STBI_NO_TGA
   7570    if (stbi__tga_info(s, x, y, comp))
   7571        return 1;
   7572    #endif
   7573    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
   7574 }
   7575 
   7576 static int stbi__is_16_main(stbi__context *s)
   7577 {
   7578    #ifndef STBI_NO_PNG
   7579    if (stbi__png_is16(s))  return 1;
   7580    #endif
   7581 
   7582    #ifndef STBI_NO_PSD
   7583    if (stbi__psd_is16(s))  return 1;
   7584    #endif
   7585 
   7586    #ifndef STBI_NO_PNM
   7587    if (stbi__pnm_is16(s))  return 1;
   7588    #endif
   7589    return 0;
   7590 }
   7591 
   7592 #ifndef STBI_NO_STDIO
   7593 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
   7594 {
   7595     FILE *f = stbi__fopen(filename, "rb");
   7596     int result;
   7597     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7598     result = stbi_info_from_file(f, x, y, comp);
   7599     fclose(f);
   7600     return result;
   7601 }
   7602 
   7603 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
   7604 {
   7605    int r;
   7606    stbi__context s;
   7607    long pos = ftell(f);
   7608    stbi__start_file(&s, f);
   7609    r = stbi__info_main(&s,x,y,comp);
   7610    fseek(f,pos,SEEK_SET);
   7611    return r;
   7612 }
   7613 
   7614 STBIDEF int stbi_is_16_bit(char const *filename)
   7615 {
   7616     FILE *f = stbi__fopen(filename, "rb");
   7617     int result;
   7618     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7619     result = stbi_is_16_bit_from_file(f);
   7620     fclose(f);
   7621     return result;
   7622 }
   7623 
   7624 STBIDEF int stbi_is_16_bit_from_file(FILE *f)
   7625 {
   7626    int r;
   7627    stbi__context s;
   7628    long pos = ftell(f);
   7629    stbi__start_file(&s, f);
   7630    r = stbi__is_16_main(&s);
   7631    fseek(f,pos,SEEK_SET);
   7632    return r;
   7633 }
   7634 #endif // !STBI_NO_STDIO
   7635 
   7636 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
   7637 {
   7638    stbi__context s;
   7639    stbi__start_mem(&s,buffer,len);
   7640    return stbi__info_main(&s,x,y,comp);
   7641 }
   7642 
   7643 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
   7644 {
   7645    stbi__context s;
   7646    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7647    return stbi__info_main(&s,x,y,comp);
   7648 }
   7649 
   7650 STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
   7651 {
   7652    stbi__context s;
   7653    stbi__start_mem(&s,buffer,len);
   7654    return stbi__is_16_main(&s);
   7655 }
   7656 
   7657 STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
   7658 {
   7659    stbi__context s;
   7660    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7661    return stbi__is_16_main(&s);
   7662 }
   7663 
   7664 #endif // STB_IMAGE_IMPLEMENTATION
   7665 
   7666 /*
   7667    revision history:
   7668       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
   7669       2.19  (2018-02-11) fix warning
   7670       2.18  (2018-01-30) fix warnings
   7671       2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
   7672                          1-bit BMP
   7673                          *_is_16_bit api
   7674                          avoid warnings
   7675       2.16  (2017-07-23) all functions have 16-bit variants;
   7676                          STBI_NO_STDIO works again;
   7677                          compilation fixes;
   7678                          fix rounding in unpremultiply;
   7679                          optimize vertical flip;
   7680                          disable raw_len validation;
   7681                          documentation fixes
   7682       2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
   7683                          warning fixes; disable run-time SSE detection on gcc;
   7684                          uniform handling of optional "return" values;
   7685                          thread-safe initialization of zlib tables
   7686       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
   7687       2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
   7688       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
   7689       2.11  (2016-04-02) allocate large structures on the stack
   7690                          remove white matting for transparent PSD
   7691                          fix reported channel count for PNG & BMP
   7692                          re-enable SSE2 in non-gcc 64-bit
   7693                          support RGB-formatted JPEG
   7694                          read 16-bit PNGs (only as 8-bit)
   7695       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
   7696       2.09  (2016-01-16) allow comments in PNM files
   7697                          16-bit-per-pixel TGA (not bit-per-component)
   7698                          info() for TGA could break due to .hdr handling
   7699                          info() for BMP to shares code instead of sloppy parse
   7700                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
   7701                          code cleanup
   7702       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
   7703       2.07  (2015-09-13) fix compiler warnings
   7704                          partial animated GIF support
   7705                          limited 16-bpc PSD support
   7706                          #ifdef unused functions
   7707                          bug with < 92 byte PIC,PNM,HDR,TGA
   7708       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
   7709       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
   7710       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
   7711       2.03  (2015-04-12) extra corruption checking (mmozeiko)
   7712                          stbi_set_flip_vertically_on_load (nguillemot)
   7713                          fix NEON support; fix mingw support
   7714       2.02  (2015-01-19) fix incorrect assert, fix warning
   7715       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
   7716       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
   7717       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
   7718                          progressive JPEG (stb)
   7719                          PGM/PPM support (Ken Miller)
   7720                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
   7721                          GIF bugfix -- seemingly never worked
   7722                          STBI_NO_*, STBI_ONLY_*
   7723       1.48  (2014-12-14) fix incorrectly-named assert()
   7724       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
   7725                          optimize PNG (ryg)
   7726                          fix bug in interlaced PNG with user-specified channel count (stb)
   7727       1.46  (2014-08-26)
   7728               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
   7729       1.45  (2014-08-16)
   7730               fix MSVC-ARM internal compiler error by wrapping malloc
   7731       1.44  (2014-08-07)
   7732               various warning fixes from Ronny Chevalier
   7733       1.43  (2014-07-15)
   7734               fix MSVC-only compiler problem in code changed in 1.42
   7735       1.42  (2014-07-09)
   7736               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
   7737               fixes to stbi__cleanup_jpeg path
   7738               added STBI_ASSERT to avoid requiring assert.h
   7739       1.41  (2014-06-25)
   7740               fix search&replace from 1.36 that messed up comments/error messages
   7741       1.40  (2014-06-22)
   7742               fix gcc struct-initialization warning
   7743       1.39  (2014-06-15)
   7744               fix to TGA optimization when req_comp != number of components in TGA;
   7745               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
   7746               add support for BMP version 5 (more ignored fields)
   7747       1.38  (2014-06-06)
   7748               suppress MSVC warnings on integer casts truncating values
   7749               fix accidental rename of 'skip' field of I/O
   7750       1.37  (2014-06-04)
   7751               remove duplicate typedef
   7752       1.36  (2014-06-03)
   7753               convert to header file single-file library
   7754               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
   7755       1.35  (2014-05-27)
   7756               various warnings
   7757               fix broken STBI_SIMD path
   7758               fix bug where stbi_load_from_file no longer left file pointer in correct place
   7759               fix broken non-easy path for 32-bit BMP (possibly never used)
   7760               TGA optimization by Arseny Kapoulkine
   7761       1.34  (unknown)
   7762               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
   7763       1.33  (2011-07-14)
   7764               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
   7765       1.32  (2011-07-13)
   7766               support for "info" function for all supported filetypes (SpartanJ)
   7767       1.31  (2011-06-20)
   7768               a few more leak fixes, bug in PNG handling (SpartanJ)
   7769       1.30  (2011-06-11)
   7770               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
   7771               removed deprecated format-specific test/load functions
   7772               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
   7773               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
   7774               fix inefficiency in decoding 32-bit BMP (David Woo)
   7775       1.29  (2010-08-16)
   7776               various warning fixes from Aurelien Pocheville
   7777       1.28  (2010-08-01)
   7778               fix bug in GIF palette transparency (SpartanJ)
   7779       1.27  (2010-08-01)
   7780               cast-to-stbi_uc to fix warnings
   7781       1.26  (2010-07-24)
   7782               fix bug in file buffering for PNG reported by SpartanJ
   7783       1.25  (2010-07-17)
   7784               refix trans_data warning (Won Chun)
   7785       1.24  (2010-07-12)
   7786               perf improvements reading from files on platforms with lock-heavy fgetc()
   7787               minor perf improvements for jpeg
   7788               deprecated type-specific functions so we'll get feedback if they're needed
   7789               attempt to fix trans_data warning (Won Chun)
   7790       1.23    fixed bug in iPhone support
   7791       1.22  (2010-07-10)
   7792               removed image *writing* support
   7793               stbi_info support from Jetro Lauha
   7794               GIF support from Jean-Marc Lienher
   7795               iPhone PNG-extensions from James Brown
   7796               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
   7797       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
   7798       1.20    added support for Softimage PIC, by Tom Seddon
   7799       1.19    bug in interlaced PNG corruption check (found by ryg)
   7800       1.18  (2008-08-02)
   7801               fix a threading bug (local mutable static)
   7802       1.17    support interlaced PNG
   7803       1.16    major bugfix - stbi__convert_format converted one too many pixels
   7804       1.15    initialize some fields for thread safety
   7805       1.14    fix threadsafe conversion bug
   7806               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
   7807       1.13    threadsafe
   7808       1.12    const qualifiers in the API
   7809       1.11    Support installable IDCT, colorspace conversion routines
   7810       1.10    Fixes for 64-bit (don't use "unsigned long")
   7811               optimized upsampling by Fabian "ryg" Giesen
   7812       1.09    Fix format-conversion for PSD code (bad global variables!)
   7813       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
   7814       1.07    attempt to fix C++ warning/errors again
   7815       1.06    attempt to fix C++ warning/errors again
   7816       1.05    fix TGA loading to return correct *comp and use good luminance calc
   7817       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
   7818       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
   7819       1.02    support for (subset of) HDR files, float interface for preferred access to them
   7820       1.01    fix bug: possible bug in handling right-side up bmps... not sure
   7821               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
   7822       1.00    interface to zlib that skips zlib header
   7823       0.99    correct handling of alpha in palette
   7824       0.98    TGA loader by lonesock; dynamically add loaders (untested)
   7825       0.97    jpeg errors on too large a file; also catch another malloc failure
   7826       0.96    fix detection of invalid v value - particleman@mollyrocket forum
   7827       0.95    during header scan, seek to markers in case of padding
   7828       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
   7829       0.93    handle jpegtran output; verbose errors
   7830       0.92    read 4,8,16,24,32-bit BMP files of several formats
   7831       0.91    output 24-bit Windows 3.0 BMP files
   7832       0.90    fix a few more warnings; bump version number to approach 1.0
   7833       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
   7834       0.60    fix compiling as c++
   7835       0.59    fix warnings: merge Dave Moore's -Wall fixes
   7836       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
   7837       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
   7838       0.56    fix bug: zlib uncompressed mode len vs. nlen
   7839       0.55    fix bug: restart_interval not initialized to 0
   7840       0.54    allow NULL for 'int *comp'
   7841       0.53    fix bug in png 3->4; speedup png decoding
   7842       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
   7843       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
   7844               on 'test' only check type, not whether we support this variant
   7845       0.50  (2006-11-19)
   7846               first released version
   7847 */
   7848 
   7849 
   7850 /*
   7851 ------------------------------------------------------------------------------
   7852 This software is available under 2 licenses -- choose whichever you prefer.
   7853 ------------------------------------------------------------------------------
   7854 ALTERNATIVE A - MIT License
   7855 Copyright (c) 2017 Sean Barrett
   7856 Permission is hereby granted, free of charge, to any person obtaining a copy of
   7857 this software and associated documentation files (the "Software"), to deal in
   7858 the Software without restriction, including without limitation the rights to
   7859 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
   7860 of the Software, and to permit persons to whom the Software is furnished to do
   7861 so, subject to the following conditions:
   7862 The above copyright notice and this permission notice shall be included in all
   7863 copies or substantial portions of the Software.
   7864 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7865 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7866 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7867 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   7868 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   7869 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   7870 SOFTWARE.
   7871 ------------------------------------------------------------------------------
   7872 ALTERNATIVE B - Public Domain (www.unlicense.org)
   7873 This is free and unencumbered software released into the public domain.
   7874 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
   7875 software, either in source code form or as a compiled binary, for any purpose,
   7876 commercial or non-commercial, and by any means.
   7877 In jurisdictions that recognize copyright laws, the author or authors of this
   7878 software dedicate any and all copyright interest in the software to the public
   7879 domain. We make this dedication for the benefit of the public at large and to
   7880 the detriment of our heirs and successors. We intend this dedication to be an
   7881 overt act of relinquishment in perpetuity of all present and future rights to
   7882 this software under copyright law.
   7883 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7884 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7885 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7886 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
   7887 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
   7888 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
   7889 ------------------------------------------------------------------------------
   7890 */