Discussion:
[NOVICE] Re: [NOVICE] Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1
(too old to reply)
Michael Swierczek
2013-02-07 18:02:05 UTC
Permalink
Keeping the names, in tact, would be helpful. Whatever I change it to, I receive the same error because of the first entry.
I've encoded the csv using Notepad++ to UTF8 and still no luck.
I think "á" followed by the next 2 characters causes the problem. Is there a better encoding for special characters? Is this possible in WIN-1252?
Zach,
I've been bitten by this misunderstanding myself. Changing the file
encoding in Notepad++ just changes a few bytes at the very beginning
of the file to indicate that it's supposed to be read as your new
encoding. It does not automatically go through the file converting
character like "à" from its 224 (decimal) character value in LATIN1
encoding to the U+00E0 UTF-8 equivalent. Maybe some other text
editors support actually re-encoding the characters in the file for
you, I don't know.

Good luck,
-Mike Swierczek
--
Sent via pgsql-novice mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Zach Seaman
2013-02-07 17:59:08 UTC
Permalink
Ok, client encoding is back to LATIN1.

Do I have to sacrifice the readability of these names or is there a way to
work around this invalid byte sequence problem?
I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.
ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]
No, the client encoding needs to be LATIN1 to read this file.
regards, tom lane
--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*
Tom Lane
2013-02-07 17:51:15 UTC
Permalink
I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.
ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]
No, the client encoding needs to be LATIN1 to read this file.

regards, tom lane
--
Sent via pgsql-novice mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Zach Seaman
2013-02-07 17:05:19 UTC
Permalink
Keeping the names, in tact, would be helpful. Whatever I change it to, I
receive the same error because of the first entry.

I've encoded the csv using Notepad++ to UTF8 and still no luck.

I think "á" followed by the next 2 characters causes the problem. Is there
a better encoding for special characters? Is this possible in WIN-1252?
I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.
ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]
Is it a trial and error type problem now?
So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.
Of course it isn't, which is why Postgres is complaining. Presumably
what that data really is is three characters (looks like "ásq") in
LATIN1. But Postgres is trying to interpret it in UTF8. As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.
regards, tom lane
--
http://www.postgresql.org/mailpref/pgsql-novice
--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*
--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*
Zach Seaman
2013-02-07 16:51:23 UTC
Permalink
I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.


ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]

Is it a trial and error type problem now?
So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.
Of course it isn't, which is why Postgres is complaining. Presumably
what that data really is is three characters (looks like "ásq") in
LATIN1. But Postgres is trying to interpret it in UTF8. As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.
regards, tom lane
--
http://www.postgresql.org/mailpref/pgsql-novice
--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*
Tom Lane
2013-02-07 16:15:27 UTC
Permalink
So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.
Of course it isn't, which is why Postgres is complaining. Presumably
what that data really is is three characters (looks like "ásq") in
LATIN1. But Postgres is trying to interpret it in UTF8. As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.

regards, tom lane
--
Sent via pgsql-novice mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Zach Seaman
2013-02-07 16:01:28 UTC
Permalink
I'm running PostgreSQL 9.1
I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.
This a similar question to this one, so I have encoded a database with
LATIN-1 as suggested but can't copy a CSV file into a table within the
database.
well, that mail is from 2005... what version of postgres are you running
at?
ERROR: invalid byte sequence for encoding "UTF8": 0xe17371
SET client_encoding TO UTF8;
before running the copy command, or maybe set to LATIN1
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157
--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*
Ken Benson
2013-02-07 15:24:14 UTC
Permalink
I think the problem may be that specific character translation.

The chart I typically use is here:
http://www.utf8-chartable.de/unicode-utf8-table.pl

The 'valid' UTF-8 codes jump from /*0x e0 bf bf*/ (at the bottom of this
page: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=3840 )
To: /*0x e1 80 80*/ (at the top of this page:
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4096

So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.

Jut my thoughts.

Ken
I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.
This a similar question to this one, so I have encoded a database with
LATIN-1 as suggested but can't copy a CSV file into a table within the
database.
well, that mail is from 2005... what version of postgres are you running at?
ERROR: invalid byte sequence for encoding "UTF8": 0xe17371
SET client_encoding TO UTF8;
before running the copy command, or maybe set to LATIN1
Jaime Casanova
2013-02-07 15:03:55 UTC
Permalink
I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.
This a similar question to this one, so I have encoded a database with
LATIN-1 as suggested but can't copy a CSV file into a table within the
database.
well, that mail is from 2005... what version of postgres are you running at?
ERROR: invalid byte sequence for encoding "UTF8": 0xe17371
run:

SET client_encoding TO UTF8;

before running the copy command, or maybe set to LATIN1
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157
--
Sent via pgsql-novice mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Gurjeet Singh
2013-02-07 03:12:43 UTC
Permalink
I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.
so I have encoded a database with LATIN-1 as suggested but can't copy a CSV
file into a table within the database.
ERROR: invalid byte sequence for encoding "UTF8": 0xe17371
Googling doesn't get me anywhere and I am working with Spanish characters.
I think the data in your CSV file should match the client_encoding
parameter.

What is your client_encoding parameter set to?

show client_encoding;
--
Gurjeet Singh

http://gurjeet.singh.im/
Loading...