function err=LoadClut(windowPtr,clut,startEntry,bits)
% [err]=LoadClut(windowPtrOrScreenNumber,clut,[startEntry],[bits])
% 
% Load the hardware color lookup table (CLUT) of a video screen. It uses
% Screen 'Gamma' and 'SetClut', as appropriate, to leave the hardware CLUT
% containing the numbers you provide in "clut", with no transformation.
% There are no restrictions. All pixelSizes are supported. Works with all
% Mac graphics cards. Fully supports 8-or-more-bit DACs. 
% 
% We strongly suggest that all users call LoadClut instead of SetClut. There
% are many reasons in favor, and essentially none against. LoadClut does
% take longer, but not enough to matter. Try ClutTimeTest.
% 
% We advise against mixed use of LoadClut and SetClut. LoadClut uses
% the gamma table as it sees fit to do its work, and the effect of
% calling SetClut depends on the current state of the gamma table. So
% you may get confusing results if you try to mix use of SetClut and
% LoadClut. Our advice is that you use only LoadClut. Don't use SetClut 
% or SetGamma. Let LoadClut do the work for you. (Suggested by Zing Lee.)
% 
% SPEED: LoadClut and SetClut are typically called from within a display
% loop, so it's important that they be fast. You typically call one or the
% other once per frame. Frame rate is typically in the range 60 to 120 Hz,
% so the frame period, from blanking to blanking, will typically be 8 to
% 17 ms. You'd like the time used by by SetClut and LoadClut to be a
% negligible fraction of that. LoadClut needs to initialize some tables
% the first time it's called, and has to recompute those tables whenever
% pixelSize or bits or fixATIRadeon7000 has changed. The rest of the time,
% LoadClut is fast. This is confirmed by ClutTimeTest. When SetClut waits
% for blanking, nearly all the time is spent waiting, whether you call
% SetClut or LoadClut. When SetClut is set not to wait, then SetClut is
% typically very quick. Depending on your driver, it may take roughly 0.3
% ms and iterate at 3 kHz. 0.3 ms is a negligible fraction of the frame
% period. Under the same conditions (bits==8 on PowerMac G4/500), LoadClut
% iterates at nearly 1 kHz, taking a bit more than 1 ms per iteration.
% This is not quite negligible, but should be tolerable in most
% applications. We haven't yet timed LoadClut when bits==10.
% 
% NOTE: In the Mac OS, one must use both cscSetEntries/cscDirectSetEntries
% (ie Screen 'SetClut') and cscSetGamma (ie Screen 'Gamma') to determine
% the contents of the hardware CLUT. This is needlessly confusing. We
% suggest that all users use LoadClut.m instead, which does the whole job
% in one call.
% 
% FUNCTION ARGUMENTS:
% 
% If the "err" output struct argument is present then all warnings are 
% suppressed. "err" will normally be empty. If errors occurred, "err"
% will have fields err.setGamma or err.setClut with the error code 
% returned by cscSetGamma or cscSetEntries/cscDirectSetEntries.
% 
% "clut", the user-supplied color table, should be a clutSizex3 matrix.
% Each row in the "clut" matrix is loaded into an RGB entry in the
% hardware CLUT. The values of the matrix elements should be integers in
% the range 0 to 2^bits-1. In order to prepare the clut correctly, user
% programs must ascertain clutSize and DAC bits:
% 		clutSize=ScreenClutSize(windowPtr);
% 		bits=ScreenDacBits(windowPtr);
% 
% The clutSize (number of entries in the CLUT) depends on the pixelSize:
% 	pixelSize  clutSize
% 	        1         2
% 	        2         4
% 	        4        16
% 	        8       256
% 	       16        32
% 	       32       256
% 
% "startEntry" is optional and determines which hardware CLUT entry to
% load first. Entries are numbered from 0 up. The default is 0. The first
% element of "clut", i.e. clut(1), will be loaded into hardware entry
% "startEntry".
% 
% "bits" specifies how many bits you want to write to the CLUT. Typically
% it will be ScreenDacBits, which is the default value. If you set it to
% some other value, the range of allowable entries scales accordingly.
% Thus if you use a 10-bit CLUT, then each entry should be between 0 and
% 1023, etc.
% 
% PrepareScreen:
% 
% LoadClut depends critically on the various Screen Preference values
% having been set up correctly by PrepareScreen. It automatically calls
% PrepareScreen if PrepareScreen has not yet been run (because you haven't
% yet opened a window or explicitly called PrepareScreen for this screen).
% You don't need to worry about this, but you should be aware that
% PrepareScreen takes several seconds to run, the first time it is run, so
% your first call to LoadClut may be slow.
% 
% GRAPHICS CARDS WITH MORE-THAN-8-BIT DACS:
% 
% The ATI Radeon and Radius ThunderPower (no longer sold) have 10-bit
% DACs. The BITS++ adapter from Cambridge Research Systems has 14-bit
% DACs.
% http://www.crsltd.com/catalog/bits++/
% 
% You may wish to enable/disable a fix in LoadClut to a minor Radeon 7000
% driver bug that otherwise produces a minimal 1 step output error (1 part
% in 1024). To do so, go to PrepareScreen.m and comment/uncomment the line
% (near 128) that sets screenGlobal(s).fixATIRadeon7000.
% 
% BACKGROUND:
% 
% Screen 'Gamma' and 'SetClut' are Psychtoolbox routines that call the
% graphics card driver using Apple's low-level cscSetGamma, cscSetEntries,
% and cscDirectSetEntries. In general you should assume that LoadClut will
% overwrite both the gamma table and the CLUT.
% 
% The loading of the hardware CLUT is achieved slightly differently
% depending on the pixelsize. In both cases we begin by calling Screen
% 'Gamma'. In 1, 2, 4, or 8-bit pixel mode, Screen 'Gamma' merely replaces
% the driver's software gamma table, without touching the hardware CLUT,
% so we follow that by a call to Screen 'SetClut', which loads the
% hardware CLUT. In 16- or 32-bit-pixel mode, the call to 'Gamma'
% implicitly does the SetClut, so we don't have to.
% 
% WAIT FOR BLANKING: The wait-for-blanking behavior of LoadClut is the
% same as that of Screen 'SetClut', and responds to the same controls:
% Screen Preference AskSetClutDriverToWaitForBlanking,
% SetClutDriverWaitsForBlanking, SetClutCallsWaitBlanking,
% SetClutPunchesBlankingClock. Most users should ignore these controls and
% simply enjoy the standard default behavior, so that your program will
% operate consistently with all video cards, allowing the code in
% PrepareScreen to do whatever driver-specific customization is necessary.
% 
% Initial timing results, on the ATI Rage128, indicate that the
% synchronization of the implicit call to SetClut is just like that for
% the explicit, so we're setting things up in the same way and hope that
% the implicit and explicit calls will behave alike in other drivers as
% well.
% 
% SetClutDuplicates8Bits: most users will never deal with this flag, as it
% is set optimally for each graphics driver by PrepareScreen.m. However, if
% you do change it, be aware that it only affects future calls to SetClut,
% and that when pixelSize>8 and dacBits>8 LoadClut won't call SetClut
% unless you tell it to
% 	screenGlobal(screenNumber+1).identityColorTableLoaded=0;
% 
% WARNING: Apple's release notes for Mac OS 9.1 warn that "The Color
% Manager has been changed so that requests for white and black on 8 bit
% devices now use the documented requirement that white is the first entry
% in the palette and black the last entry. This will cause problems if
% applications have custom palettes which do not have white and black in
% these required positions."
% Our understanding of this warning is that it's meant quite literally.
% The Mac OS (i.e. QuickDraw) cares about the values in the software
% Palette maintained by QuickDraw for your screen. It does not know or
% care what values you ask the driver to put into the hardware CLUT.
% web http://developer.apple.com/technotes/tn/tn2010.html ;
% 
% See also ScreenDacBits, ScreenClutSize, LoadClutTest, and ClutTest.

% 8/22/00  dhb    Wrote it, as "SetClut".
% 10/3/01  bds		Changed RADIUS to GAMMA10 to include RADEON since it acts like RADIUS.
% 1/25/02  dhb    Incorporate bds changes into master version.
% 2/01/02  dhb    Remove bits_SetColor, add GAMMA10.clut.  Logic rewrite. 
% 2/28/02  dhb,ly,kr  Deal with high 10-bit cards.
% 3/20/02  dgp	  Cosmetic.
% 3/21/02  dgp	  Added note above regarding the Mac OS 9.1 Color Manager assumptions about 
%									the CLUT.
% 6/7/02   dgp    Renamed to "LoadClut", extensively revised to work on its own, without OpenWindow and CloseWindow.
% 6/22/02  dgp    Streamlined.
% 6/23/02  dgp    Added err.
% 6/25/02  dgp    Use screenGlobal.fixATIRadeon7000. Recompute tables whenever that flag changes.
% 6/28/02  dgp    Call PrepareScreen if that hasn't already been done.
% 7/2/02   dgp    Added note above, recommending LoadClut over SetClut, in response to query by David Jones.
% 7/24/02  dgp    When doing fixATIRadeon7000, clip "identityGamma" at 2^bits-1, to match treatment of "gamma".
% 8/7/02   dgp    Fixed bug, reported by david brainard, that caused failure when attempting to load less than 3
%                 entries.
% 8/24/02  dgp    Fine tune the test for stale tables to not provoke warnings the first time Screen 1 is used.
% 12/18/04 awi    Added Windows section.


if(strcmp(computer,'MAC2'))
	global screenGlobal % Static cache
	
	% Check the arguments
	s=1+Screen(windowPtr,'WindowScreenNumber');
	if nargin<2 | nargin>4
		error('USAGE: LoadClut(windowPtr,clut,[startEntry],[bits])');
	end
	if nargin<4 
		bits=ScreenDacBits(windowPtr);
	end
	if nargin<3 
		startEntry=0;
	end
	clutSize=ScreenClutSize(windowPtr);
	if startEntry<0 | startEntry>clutSize-1
		error('startEntry %d must be in range 0 to %d',startEntry,clutSize-1);
	end
	pixelSize=Screen(windowPtr,'PixelSize');
	if size(clut,1)>clutSize
		error(sprintf('Sorry, your %d-element \"clut\" is longer than the %d-entry hardware CLUT.',size(clut,1),clutSize));
	end
	if startEntry+size(clut,1)>clutSize
		error(sprintf('Sorry, startEntry %d is too high or %d-element \"clut\" is too long for the %d-entry hardware CLUT.',startEntry,size(clut,1),clutSize));
	end
	if max(clut(:))>2^bits-1 | min(clut(:))<0
		error(sprintf('\"clut\" values must be in range 0 to %d',2^bits-1));
	end
	err=[];
	if isempty(screenGlobal) | ~screenGlobal(s).open
		% 	warning('LoadClut: you didn''t open a window or call PrepareScreen before calling LoadClut.');
		PrepareScreen(s-1);
	end
	
	% Initialize the tables the first time, and whenever pixelSize or bits or fixATIRadeon7000 has changed.
	if length(screenGlobal)<s | ~isfield(screenGlobal(s),'pixelSize') ...
		| isempty(screenGlobal(s).pixelSize) | pixelSize~=screenGlobal(s).pixelSize ...
		| isempty(screenGlobal(s).bits) | bits~=screenGlobal(s).bits ...
		| screenGlobal(s).fixATIRadeon7000~=screenGlobal(s).fixedATIRadeon7000
		screenGlobal(s).identityGamma=bitshift(257*[0:255]'*[1 1 1],bits-16);
		screenGlobal(s).identityGammaLoaded=0;
		screenGlobal(s).identityColorTable=[0:clutSize-1]'*[1,1,1];
		screenGlobal(s).identityColorTableLoaded=0;
		screenGlobal(s).pixelSize=pixelSize;
		screenGlobal(s).bits=bits;
		screenGlobal(s).gamma=screenGlobal(s).identityGamma;
		screenGlobal(s).colorTable=screenGlobal(s).identityColorTable;
		if screenGlobal(s).fixATIRadeon7000
			% fix error in ATI Radeon 7000 driver
			if bits>8
				screenGlobal(s).identityGamma=min(screenGlobal(s).identityGamma+1,2^bits-1);
			end
		end
		screenGlobal(s).fixedATIRadeon7000=screenGlobal(s).fixATIRadeon7000; % remember how tables were made
	end
	
	% Load the CLUT
	if bits==8
		% 8 bit DACs: use identity gamma table and manipulate color table supplied to SetClut.
		if ~screenGlobal(s).identityGammaLoaded
			% Load the identity gamma table.
			[oldGamma,oldBits,gammaError]=Screen(windowPtr,'Gamma',screenGlobal(s).identityGamma,bits);
			if gammaError.set
				err.setGamma=gammaError.set;
				if nargout==0
					fprintf('Error %d in Screen ''Gamma''. Screen %d, bits %d\n',gammaError.set,s-1,bits);
				end
			end
			screenGlobal(s).identityGammaLoaded=1;
		end
		Screen(windowPtr,'SetClut',clut,startEntry);
		screenGlobal(s).colorTable((1:size(clut,1))+startEntry,:)=clut; % update cached copy of colorTable.
	else
		% >8 bit DACs: manipulate gamma table and supply identity color table to SetClut.
		screenGlobal(s).gamma((1:size(clut,1))+startEntry,:)=clut;
		if screenGlobal(s).fixATIRadeon7000
			gt=screenGlobal(s).gamma;
			% 		inc=find(mod(gt,341)~=0);
			inc=find(gt<1023);
			gt(inc)=gt(inc)+1;
			[oldGamma,oldBits,gammaError]=Screen(windowPtr,'Gamma',gt,bits);
		else
			[oldGamma,oldBits,gammaError]=Screen(windowPtr,'Gamma',screenGlobal(s).gamma,bits);   % does implicit SetClut if pixelSize>8.
		end
		if gammaError.set
			err.setGamma=gammaError.set;
			if nargout==0
				fprintf('Error %d in Screen ''Gamma''. Screen %d, bits %d\n',gammaError.set,s-1,bits);
			end
		end
		if pixelSize<=8 | ~screenGlobal(s).identityColorTableLoaded
			% Load the identity color table.
			setClutError=Screen(windowPtr,'SetClut',screenGlobal(s).identityColorTable); % always waits for blanking
			if setClutError
				err.setClut=setClutError;
				if nargout==0
					fprintf('Error %d in Screen ''SetClut''. Screen %d, bits %d\n',setClutError,s-1,bits);
				end
			end
			screenGlobal(s).identityColorTableLoaded=1;
		end
	end
elseif(strcmp(computer,'PCWIN'))
    % On Windows we support only 256-entry CLUTS with 8-bit pixel depth and
    % 8-bit DACS.  Also, win Screen does not use PrepareScreen and the
    % ScreenGlobal. So all we do is call Screen('SetClut').
    if nargin == 2
        SCREEN(windowPtr,'SetClut',clut);
    elseif nargin==3
        SCREEN(windowPtr,'SetClut',clut,startEntry);
    elseif nargin==4
        SCREEN(windowPtr,'SetClut',clut,startEntry,bits);
    else
        error('Wrong number of arguments supplied to LoadClut');
    end
else
    error('LoadClut called on unsupported platform')
end
