patched/ruby/ruby-character-encodings.git
2 years agoFix compilation with cygwin 1.5 master
Alexey Borzenkov [Wed, 24 Jun 2009 11:20:21 +0000]
Fix compilation with cygwin 1.5

Cygwin 1.5 doesn't define wcsxfrm, and Cygwin 1.7 implements it
as wcslcpy, so under cygwin it should be safe to define wcsxfrm as
wcslcpy.

3 years agoFix compilation under mingw
Alexey Borzenkov [Tue, 14 Apr 2009 08:50:54 +0000]
Fix compilation under mingw

Mingw does not support visibility("hidden") and aborts

3 years ago.gitignore: Ignore *.bundle for Mac OS X upstream
Nikolai Weibull [Mon, 30 Mar 2009 21:15:38 +0000]
.gitignore: Ignore *.bundle for Mac OS X

3 years agoFix compilation directives
Nikolai Weibull [Mon, 30 Mar 2009 21:15:25 +0000]
Fix compilation directives

3 years agoAdd unified binary search and table lookup code
Nikolai Weibull [Mon, 30 Mar 2009 19:40:04 +0000]
Add unified binary search and table lookup code

4 years agoRakefile: v0.4.1
Nikolai Weibull [Fri, 7 Dec 2007 10:15:33 +0000]
Rakefile: v0.4.1

4 years agoext/encoding/character/utf-8/private.h: Fix HIDDEN macro for non-GCC.
Nikolai Weibull [Fri, 7 Dec 2007 10:15:10 +0000]
ext/encoding/character/utf-8/private.h: Fix HIDDEN macro for non-GCC.

4 years agoTime for v0.4.0
Nikolai Weibull [Tue, 4 Dec 2007 20:13:48 +0000]
Time for v0.4.0

4 years agoRemove inline attribute from _unichar_combining_class and rename it.
Nikolai Weibull [Tue, 4 Dec 2007 20:00:21 +0000]
Remove inline attribute from _unichar_combining_class and rename it.

4 years agoTime for version 0.3.0.
Nikolai Weibull [Thu, 22 Nov 2007 15:34:21 +0000]
Time for version 0.3.0.

4 years agoFix edge case in #reverse.
Nikolai Weibull [Thu, 22 Nov 2007 15:33:07 +0000]
Fix edge case in #reverse.

4 years agoFix minor bug in handling of Bignums.
Nikolai Weibull [Thu, 22 Nov 2007 15:25:44 +0000]
Fix minor bug in handling of Bignums.

This fixes

  [#7037] u("10000").to_i -> 5433328 if (bit_length > sizeof(VALUE)
        * CHAR_BIT)

Thanks go out to Hannes Wyss for reporting this and providing a patch.

4 years agoSimplify combine() somewhat.
Nikolai Weibull [Wed, 1 Aug 2007 14:04:47 +0000]
Simplify combine() somewhat.

4 years agoSimplify lookup_compose() slightly.
Nikolai Weibull [Wed, 1 Aug 2007 14:00:29 +0000]
Simplify lookup_compose() slightly.

4 years agoSimplify the test in decompose_hangul() for slightly more readable code.
Nikolai Weibull [Wed, 1 Aug 2007 13:57:10 +0000]
Simplify the test in decompose_hangul() for slightly more readable code.

4 years agoUse newly defined split Unicode-table lookup functions in more places.
Nikolai Weibull [Wed, 1 Aug 2007 13:50:07 +0000]
Use newly defined split Unicode-table lookup functions in more places.

4 years agoRemove old comment.
Nikolai Weibull [Wed, 1 Aug 2007 13:42:17 +0000]
Remove old comment.

4 years agoRefactor the split Unicode-table lookup routines.
Nikolai Weibull [Wed, 1 Aug 2007 13:41:01 +0000]
Refactor the split Unicode-table lookup routines.

4 years agoRemove left-over break.
Nikolai Weibull [Mon, 30 Jul 2007 19:12:17 +0000]
Remove left-over break.

4 years agoSimplify calling-sequence for decompose_hangul().
Nikolai Weibull [Mon, 30 Jul 2007 14:31:15 +0000]
Simplify calling-sequence for decompose_hangul().

We might as well return the length of the decomposed character instead of
storing it in a parameter.  This simplifies its use in
normalize_wc_decompose() as we don’t need to introduce a temporary.

4 years agoClean up utf_downcase().
Nikolai Weibull [Mon, 30 Jul 2007 13:16:00 +0000]
Clean up utf_downcase().

The old utf_downcase() was quite complex and hard to grasp.  This cleans
it up a bit by factoring out the special cases somewhat, along with some
duplicated-code removal.

4 years agoDon’t calle setlocale() with an argument.
Nikolai Weibull [Mon, 30 Jul 2007 09:21:46 +0000]
Don’t calle setlocale() with an argument.

We shouldn’t be overriding the user’s locale.

4 years agoFactor out binary searches.
Nikolai Weibull [Fri, 27 Jul 2007 14:22:23 +0000]
Factor out binary searches.

Binary search was used in a couple of places to look up values in tables.
This patch fators out that code, making the calling code easier to follow
and makes sure that any binary-search-related bugs are caught at one point
in the code.

4 years agoFix corner-case for special-cased characters.
Nikolai Weibull [Fri, 27 Jul 2007 12:02:00 +0000]
Fix corner-case for special-cased characters.

4 years agoUse OFFSET_IF() in one more place.
Nikolai Weibull [Fri, 27 Jul 2007 11:58:44 +0000]
Use OFFSET_IF() in one more place.

Using the OFFSET_IF() macro is a lot easier to understand than the whole
(X != NULL) ? X + LEN : NULL.

4 years agoClarify edge-case for utf_char_*() functions where MAX is 0.
Nikolai Weibull [Fri, 27 Jul 2007 11:55:02 +0000]
Clarify edge-case for utf_char_*() functions where MAX is 0.

If MAX is 0, there's no possible character to read.

4 years agoDon’t mark _unichar_combining_class() as inline.
Nikolai Weibull [Fri, 27 Jul 2007 11:47:03 +0000]
Don’t mark _unichar_combining_class() as inline.

This function is, as it is currently defined, not eligible for inlining.

4 years agoAdd inlining flags and warnings for GCC.
Nikolai Weibull [Fri, 27 Jul 2007 11:46:01 +0000]
Add inlining flags and warnings for GCC.

4 years agoAdd tests for casing methods.
Nikolai Weibull [Fri, 27 Jul 2007 11:45:21 +0000]
Add tests for casing methods.

5 years agoUpdate the position-calculating code
Nikolai Weibull [Sun, 30 Jul 2006 20:14:55 +0000]
Update the position-calculating code

Saying that the position is the difference between the point we’re at and
the beginning of the string using pointer subtraction is only correct if
we’re talking bytes.  Use utf_pointer_to_offset() instead to give the
character offset.

5 years agoAdd #define for UNICODE_FIRST_CHAR_PART2 to character-tables.h
Nikolai Weibull [Sat, 29 Jul 2006 14:49:33 +0000]
Add #define for UNICODE_FIRST_CHAR_PART2 to character-tables.h

Also, make sure that this #define is used.

5 years agoFactor out decomposition-to-unichar-array code.
Nikolai Weibull [Sat, 29 Jul 2006 14:37:46 +0000]
Factor out decomposition-to-unichar-array code.

5 years agoSimplify normalization code a bit.
Nikolai Weibull [Fri, 28 Jul 2006 12:29:23 +0000]
Simplify normalization code a bit.

We don’t have to shift down all characters all the time, simply make sure
to copy the ones we want into the slots that exist.

Thanks go out to Lugovoi Nikolai for the suggestion and sample
implementation of this patch.

5 years agoFix silly remnant from change to normalization code.
Nikolai Weibull [Thu, 27 Jul 2006 21:03:43 +0000]
Fix silly remnant from change to normalization code.

The ‘swapped’ parameter was recently added to
unicode_canonical_ordering_swap(), but it was still pretending to return
whether something was swapped or not.  Simply turn the function into
void-returning one.

5 years agoFix typo in PackageFiles definition
Nikolai Weibull [Thu, 27 Jul 2006 21:00:32 +0000]
Fix typo in PackageFiles definition

The specifications wouldn’t be distributed with the package.

5 years agoMake sure that the “tests/data” directory exists
Nikolai Weibull [Thu, 27 Jul 2006 20:59:20 +0000]
Make sure that the “tests/data” directory exists

This makes sure that the “tests/data” directory exists before looking for a
file in that directory.

5 years agoAdd normalization methods
Nikolai Weibull [Thu, 27 Jul 2006 20:51:15 +0000]
Add normalization methods

It’s time to add normalization methods.  It’s basically String#normalize
that takes an optional parameter to specify what kind of normalization is
desired (:nfc, :nfd, :nfkc, :nfkd).  Also add test cases for this along
with test cases for String#folcase.

5 years agoCleanup extconf.rb for utf-8 directory
Nikolai Weibull [Thu, 27 Jul 2006 20:20:43 +0000]
Cleanup extconf.rb for utf-8 directory

5 years agoUpdate normalization code to adhere to PR #29.
Nikolai Weibull [Thu, 27 Jul 2006 20:15:07 +0000]
Update normalization code to adhere to PR #29.

According to PR #29 [1] there’s a mistake in the Unicode standard.  This
patch updates the normalization code to adhere to the updated rules of PR

[1] http://www.unicode.org/review/pr-29.html

5 years agoUpcasing problem due to stupid boolean reversal.
Nikolai Weibull [Thu, 27 Jul 2006 17:13:30 +0000]
Upcasing problem due to stupid boolean reversal.

Upper is when it isn’t lower.  It should really not be called upper, but
rather not_lower.

5 years agoUpdate Unicode-data script to take arguments
Nikolai Weibull [Thu, 27 Jul 2006 15:23:54 +0000]
Update Unicode-data script to take arguments

The Unicode-data-generating script, generate-unicode-data.rb had hardcoded
the version of Unicode and the location of the input files.  Take these as
arguments instead.

Also, along with this, update the generated files so that the correct
version is listed.

5 years agoFix String#upcase in lithuanian locale.
Nikolai Weibull [Wed, 26 Jul 2006 13:19:47 +0000]
Fix String#upcase in lithuanian locale.

The problem was that accents were being doubled for ‘i’s in the lithuanian
locale.  The reason for this was that the pointer into the string being
upcased wasn’t being advanced properly when output the correct combining
characters, so previous combining characters (from the lowercase version)
would linger.  The fix was to make sure that the pointer into the string
was advanced properly.

This problem was spotted by Lugovoi Nikolai.

5 years agoFix two signedness warnings produced by GCC 4.1.
Nikolai Weibull [Wed, 26 Jul 2006 09:52:29 +0000]
Fix two signedness warnings produced by GCC 4.1.

5 years agoFix bug in String#[] when passing it a Regexp.
Nikolai Weibull [Wed, 26 Jul 2006 09:37:06 +0000]
Fix bug in String#[] when passing it a Regexp.

The problem was simple, a parenthesis had been moved to the end of the
expression; silly me.  Thanks go to Lugovoi Nikolai for reporting this and
providing a fix.

5 years agolib/encoding/character/utf-8.rb: Remove .so extension from require line.
Nikolai Weibull [Mon, 24 Jul 2006 23:08:12 +0000]
lib/encoding/character/utf-8.rb: Remove .so extension from require line.

Not all platforms use .so as a library extension, so just let Ruby figure
it out for itself.

5 years agoFix bug in upcasing code that would make upcasing in Lithuanian not work.
Nikolai Weibull [Mon, 24 Jul 2006 23:07:05 +0000]
Fix bug in upcasing code that would make upcasing in Lithuanian not work.

5 years agoRemove tags file that was inadverently included in repository.
Nikolai Weibull [Mon, 24 Jul 2006 12:33:25 +0000]
Remove tags file that was inadverently included in repository.

5 years agoMake array of word strings const to save some relocation work.
Nikolai Weibull [Mon, 24 Jul 2006 12:26:44 +0000]
Make array of word strings const to save some relocation work.

As suggested in [1], use const for the array as well.

(Also, move declaration of Init_utf8() to right before the function itself,
as it’s only there to shut upp the compiler about an undeclared extern.)

[1] Ulrich Drepper, “How To Write Shared Libraries”,
    http://people.redhat.com/drepper/dsohowto.pdf.

5 years agoRemove old Unicode data files (4.1).
Nikolai Weibull [Mon, 24 Jul 2006 11:12:37 +0000]
Remove old Unicode data files (4.1).

These files should probably not have been in the repository at all.

By the way, I forgot to credit Lugovoi Nikolai for finding the last two
bugs (#length and #upcase/#downcase).  Sorry about that.

5 years agoPrevent utf_upcase() and utf_downcase() from casing uncasable characters.
Nikolai Weibull [Mon, 24 Jul 2006 11:10:08 +0000]
Prevent utf_upcase() and utf_downcase() from casing uncasable characters.

Not all lowercase characters have an uppercase equivalent, and vice versa.
Add a check to make sure that we don’t try to do it anyway.

5 years agoAdd a specification for the #length method.
Nikolai Weibull [Mon, 24 Jul 2006 10:09:14 +0000]
Add a specification for the #length method.

To make sure that we keep to the specification of #length, add a
specification for it.

5 years agoUpdated utf_length_n() to count NUL bytes as normal characters.
Nikolai Weibull [Mon, 24 Jul 2006 10:08:27 +0000]
Updated utf_length_n() to count NUL bytes as normal characters.

Previously utf_length_n() would stop at the first NUL byte, even if ‘len’
said that more data was available.  This conforms with the semantics of,
for example, strnlen(), but as we in Ruby can easily get Strings that
contain any byte, we can’t treat NUL bytes separately in the case where
‘len’ is provided (as opposed to utf_length() that is).  So all *_n()
functions will have to be rewritten to handle NUL bytes like any other
byte.  We begin with the utf_length_n() function, because it is an obvious
case where this was causing a problem.

5 years agoRakefile: Simplify TAGS target.
Nikolai Weibull [Sun, 23 Jul 2006 22:26:17 +0000]
Rakefile: Simplify TAGS target.

5 years agoRakefile: Add clean and clobber tasks and fix some small bugs.
Nikolai Weibull [Sun, 23 Jul 2006 21:50:39 +0000]
Rakefile: Add clean and clobber tasks and fix some small bugs.

The Makefiles won’t always exist, so fake a list of sources that is
probably going to be right if one can’t be extracted from the Makefile.

5 years agoRakefile: Add tasks to build, install, and uninstall a gem.
Nikolai Weibull [Sun, 23 Jul 2006 21:26:16 +0000]
Rakefile: Add tasks to build, install, and uninstall a gem.

5 years agoIndentation fix.
Nikolai Weibull [Sun, 23 Jul 2006 20:53:43 +0000]
Indentation fix.

One can still tell what parts originally come from string.c :-).

5 years agoRemove strstr, strnstr, and strrnstr from public interface.
Nikolai Weibull [Sun, 23 Jul 2006 17:32:17 +0000]
Remove strstr, strnstr, and strrnstr from public interface.

These functions should never have been exported.  Also, their inclusion is
sometimes not necessary so they should really be wrapped up neatly in some
Anyway…

5 years agoInitial commit.
Nikolai Weibull [Sun, 23 Jul 2006 16:23:06 +0000]
Initial commit.