Showing posts with label source. Show all posts
Showing posts with label source. Show all posts

Sunday, March 11, 2012

Adding Row Numbers to Flat File Source

Hi,

I was wondering if it was possible to add an identity column to a flat file data source as it is being processed in a data flow. I need to know the record number of each row in the file. Can this be done with the derived column task or is it possible to return the value of row count on each row of the data?

Any help on this is greatly recieved.

Cheers,

Grant

Generating Surrogate Keys
(http://www.sqlis.com/default.aspx?37)

-Jamie

|||

You may have more than one option here; I would try with a script task in control flow that uses a variable to generate and adds it as a column to the pipeline.

Create a SSIS variable Int32. eg MyIdentityColSeed that is set to zero.

Then, in your data flow, after the flat file source create a script task with 1 output column (named for example MyIdentityCol) and use and script like this:

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

Row.MyIdentityCol = Variables.MyIdentityColSeed + Counter

Counter += 1

End Sub

Even when the code is very simple; I did not tested it, so take the time to debug it.

|||Rafael, Jamie,

Thanks for both your posts on this matter. I'm looking into Rafael's suggestion at present and see that the only way to access variables in the data flow section is within the PostExecute phase. That problem with this is that it doesn't seem to update the variable as i move through the rows in the dataflow. Is this because of how SSIS has been developped. Can you not set a variable anywhere else within the script. I'll look at Jamie's suggestion once this one has been exhausted.

Thanks,

Grant|||

The Pre and Post execute limitation is just that. It also makes more sense as you only need to read and write the variables pre/post, which is faster than accessing them for every row. Use local variables in the actual main execution code.

You may want to take a quick look at the Row Number Transformation, it may do exactly what you want with seed and increment settings already- http://www.sqlis.com/default.aspx?93

|||Got Rafaels suggestion to work. Didn't need to use a variable in the end just a counter that was initialized on the preexecute event which was incremented for each row. Many thanks for everyone's help.

Yet more new things learned today :) .

Thank you all.

Grant|||

Grant Swan wrote:

Rafael, Jamie,

Thanks for both your posts on this matter. I'm looking into Rafael's suggestion at present and see that the only way to access variables in the data flow section is within the PostExecute phase. That problem with this is that it doesn't seem to update the variable as i move through the rows in the dataflow. Is this because of how SSIS has been developped. Can you not set a variable anywhere else within the script. I'll look at Jamie's suggestion once this one has been exhausted.

Thanks,

Grant

Errr...both Rafael's and my suggestions were exactly the same

Glad you got it working anyhow!

-Jamie

|||

Grant,

I think Jamie is right; the approaches are identical. I do not keep records of web resources I used; so I was unable, like Jamie, to post a link to the original sorce. I guess a owe some credits to author of the script

Rafael Salas

|||Jamie,

Sorry, I'd had a previous link open from the same website and was looking at this which had some resemblence to what i wanted to do:

http://www.sqlis.com/default.aspx?93

As apposed to the link i'd clicked from your post. Apologies and i'll mark your post as an answer also.

Its been a bad day and i've got so many SSIS things in my head :)

Cheers,

Grant|||

Grant Swan wrote:

Jamie,

Sorry, I'd had a previous link open from the same website and was looking at this which had some resemblence to what i wanted to do:

http://www.sqlis.com/default.aspx?93

As apposed to the link i'd clicked from your post. Apologies and i'll mark your post as an answer also.

Its been a bad day and i've got so many SSIS things in my head :)

Cheers,

Grant

Ha ha. Cool. I'm not bothered about getting my "answer count" up, as long as the thread has at least one reply marked as the answer. thanks anyway tho!

-Jamie

adding precedence to multiple files

hi all,

i have a package here which updates a DB from a flat file source.now the problem is i may get multiple files.i have used a for each loop to handle this. it takes files based on the files name9(file names has a timestamp in it).but i want to give files in order of its Creation time.

Please help me on this.i have written a script task before the for each loop and i have got the minimum creation date from all the files,i am not able to going forward from here.

does any body has an idea!!

ASAIK, the files are show in creation order in the for each loop. However, I am not sure this is gospel.

You could use a script task to load a list of files and sort the list by the file attributes. using this sorted list, you could then loop over the list using the for each loop container.

If you wanted to get clever, modify the For Each Directory example in the MS SQL 2005 example. Create your own file enumerator, ensuring the files are in the order you want.|||Generally, it does sort by filename - but there are no guarantees of this. If you want to ensure the sort order is correct, follow Crispin's advice to modify the For Each Directory sample, or use the sample posted here (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1439218&SiteID=1) by jaegd. Or read your directory, save each filename to a temp table, and use SQL sorting.

Thursday, March 8, 2012

Adding new Guid to destination

Source table has two fields, int and string. Destination has three fields, int, string and Uniqueidentifier (Guid). For each source row added to the destination, I need to add a new Guid. I know how to make a new Guid in VB.NET script, but how do I inject it into the transformation for each row. I see the “foreach loop container” component and I suspect I need to use it, but I am stumped. Any help would be most appreciated

Paul SullivanUse script component in the DataFlow (added between source and destination). When you edit it, mark your guid column as read/write input column and in the script template write something like

Row.MyGuid = ...|||

Hi,

I have the same task as you have.

Did you make the script component for adding Guid to each source row?

Could you please help me to understand it?

If you could share some source code and some settings for it - would be great!

Thanks.

|||

Vita wrote:

Hi,

I have the same task as you have.

Did you make the script component for adding Guid to each source row?

Could you please help me to understand it?

If you could share some source code and some settings for it - would be great!

Thanks.

Imports System

.

.

.

.

Row.MyGuid = System.Guid.NewGuid()

Regards

Jamie

|||

I did not think that configuration for it would be so easy.

Thanks a lot.

Adding new Guid to destination

Source table has two fields, int and string. Destination has three fields, int, string and Uniqueidentifier (Guid). For each source row added to the destination, I need to add a new Guid. I know how to make a new Guid in VB.NET script, but how do I inject it into the transformation for each row. I see the “foreach loop container” component and I suspect I need to use it, but I am stumped. Any help would be most appreciated

Paul SullivanUse script component in the DataFlow (added between source and destination). When you edit it, mark your guid column as read/write input column and in the script template write something like

Row.MyGuid = ...|||

Hi,

I have the same task as you have.

Did you make the script component for adding Guid to each source row?

Could you please help me to understand it?

If you could share some source code and some settings for it - would be great!

Thanks.

|||

Vita wrote:

Hi,

I have the same task as you have.

Did you make the script component for adding Guid to each source row?

Could you please help me to understand it?

If you could share some source code and some settings for it - would be great!

Thanks.

Imports System

.

.

.

.

Row.MyGuid = System.Guid.NewGuid()

Regards

Jamie

|||

I did not think that configuration for it would be so easy.

Thanks a lot.

Tuesday, March 6, 2012

Adding new column to Flat File Source Connection

What is the best way to deal with a flat file source when you need to add a new column? This happens constantly in our Data Warehouse, another field gets added to one of the files to be imported, as users want more data items. When I originally set the file up in Connection Managers, I used Suggest File Types, and then many adjustments made to data types and lengths on the Advanced Tab because Suggest File Types goofs a lot even if you say to use 1000 rows. I have been using the Advanced Tab revisions to minimize the Derived Column entries. The file is importing nightly. Now I have new fields added to this file, and when I open the Connection Manager for the file, it does not recognize the new columns in the file unless I click Reset Fields. If I click Reset Fields, it wipes out all the Advanced Tab revisions! If I don't click Reset Fields, it doesn't seem to recognize that the new fields are in the file?

Is it a waste of time to make Advanced Tab type and length changes? Is it a better strategy to just use Suggest Types, and not change anything, and take whatever you get and set up more Derived Column entries? How did the designers intend for file changes to be handled?

Or is there an easy way to add new fields to this import that I am overlooking? I am finding it MUCH more laborious to set up or to modify a file load in SSIS than in DTS. In DTS, I just Edit the transformation, and add the field to the Source and Destination lists, and I'm good to go. My boss isn't understanding why a "better" version is taking so much more work!

thanks,

Holly

Ah, well, I have some sympathy for you, but as soon as you said Data Warehouse, I lost that sympathy. The flat file source CANNOT change. SSIS isn't meant to be a be-all/do-all application. There are many downstream metadata components that would surely break if the flat file source could somehow magically understand the new columns. The whole point of SSIS (and many die-hard DTS people will balk at this) is to enforce strict-adherence to the way connections are built. Any changes require developer effort to ensure that the changes are handled appropriately. (No guessing!)

Suggest data types is good for a starting point, but you'll still need to go through and make sure that they are correct. The suggest feature does not scan all rows.

The more important question is why are you (the maintainers of the Data Warehouse) allowing for a constantly changing file to be introduced into the system? There should be some sort of production control in place.|||Exactly for the reasons Phil says, I don't use the "flat file" connection in SSIS for imports unless really, really needed.

If it is a standard delimited file, I use a temporary table in the same field structure as the source and the field names for the destination. Use Bulk Insert or BCP to insert into the temp table, then use SSIS to transform/move the data to the ultimate destination. In most cases use
INSERT INTO dest SELECT * from temptable. In other cases, I have to write a dynamic sql statement to read and populate the matching field names between the two tables and insert them.

That way if a new field is added, all you have to do is change the temp table and the dest table. Nothing in SSIS needs to change.

Yes, I know it is slower. But the ease of changing far outweights the added time it takes.|||

Sorry, I obviously did not explain clearly. The files are not "constantly changing."

Business users submit requests for data warehouse additions. Our data warehouse has expanded constantly over the years. New tables, new fields in existing tables, etc. When these requests are approved, the changes have to be implemented. If the data comes from the mainframe, and if it is logically part of an existing download, the programmer adds the new field/s to the appropriate download creating the extract file. Then the data warehouse administrator has to modify the package to import the new data fields. As businesses change, new regulations, new situations, new data is needed.

I don't mean that one night, suddenly, with no warning, there are new fields in a file. But I'd love to have a dollar for every field I have added to an existing DTS import over the years.

We use DTS for our ETL. I am now converting it to SSIS, but only in the Test environment. So far I have mostly been creating the SSIS packages, but since it takes a while, and other work goes on, I am coming to situations where I have to modify existing SSIS packages that I have already converted, because I have changed the DTS packages they are converting from.

I am beginning to think that I took the wrong strategy to change types and lengths on Advanced Tab, because I can't see any way of adding another field or two, without wiping out all that work. That was the purpose of my question. What are others in this situation using as their approach.

Thank you for your reply and I would be glad to hear other opinions.

Holly

|||

Tom,

Yes, I can see your point. The ideal thing would just be to let the import file define the file structure of that temporary or prep or stage table, where ever you want to dump the data. I am importing, in most cases, into a "prep" table from the text file. However, the approach to setting up the data structure of the prep tables was defined in 1999 and 2000, and the stored procedures that load from them are expecting that structure, and there are hundreds of tables, and I really don't have time to change them and then change the stored procs to deal with the new situation..... It seems to be an example of, if you set up a new ETL using SSIS it would be done very differently from an ETL that was set up under older versions. The conversion requires fixing something, and it's sort of a question of what you decide to fix.

It is interesting to hear the different ways people do things.

Holly

|||

I think a general "best practice" kind of approach to loading files is to always load them to a staging table. The goal of this initial load is that it should contain virtually no logic - it should be so simple that it virtually never fails.

A more drastic way of doing this is to use a "generic loader" setup. This consists of one staging table with enough varchar columns (max size) to match the number of columns in your widest table, plus a "target_table_name" column. You can create a generic file import process that will load every record in every file to this structure, setting the target table column differently for each load. You will have to maintain a mapping of your generic column names to the specific column names of the target table, or you can double up on your columns and add a "Column name" column for each column.

You can change datatypes, etc. when loading the targets from the generic table. This makes it a bit easier to recover from errors due to datatype issues, which is especially common when loading user-maintained Excel files, for example.

Obviously there are some drawbacks to this approach, like the mapping confusion it can create, as well as the fact that your loads tend to get pretty linear in execution. If you try to parallel load a setup like this the table gets thrashed too much and performance suffers. However, these tradeoffs do lead to a lot more flexibility and you have fewer ETL objects to maintain, though the maintenance may be more difficult in some cases.

There are always tradeoffs, but thought I'd throw the idea it out in case it might fit in your situation.

|||Holly,

I know it is a bunch of work to do to reimplement, I too have hundreds of flat files to import. I started this approach a long time ago because I could see this was going to be a problem.

I shortened part of my process in my description. I actually have a stored proc importing the flat files into a temp (staging) database with the table names the same as the flat files. I use a loop for each table, import each flat file (tablename.txt) into the temp table and process the tables from "import" database into their ultimate destination.

In your case, if you take my approach you could move them into the stage tables in a stage database, and then insert into your existing "prep" tables for the rest of the processing by the stored proc.

Neither SSIS or DTS are pretty when it comes to changing file structures for importing. Especially when all you usually need to do is map txtfile.field123 to table.field123.

Good luck|||You may try DataDefractor. It handles input data change gracefully.

Regards,
Ivan

Adding new column to Flat File Source Connection

What is the best way to deal with a flat file source when you need to add a new column? This happens constantly in our Data Warehouse, another field gets added to one of the files to be imported, as users want more data items. When I originally set the file up in Connection Managers, I used Suggest File Types, and then many adjustments made to data types and lengths on the Advanced Tab because Suggest File Types goofs a lot even if you say to use 1000 rows. I have been using the Advanced Tab revisions to minimize the Derived Column entries. The file is importing nightly. Now I have new fields added to this file, and when I open the Connection Manager for the file, it does not recognize the new columns in the file unless I click Reset Fields. If I click Reset Fields, it wipes out all the Advanced Tab revisions! If I don't click Reset Fields, it doesn't seem to recognize that the new fields are in the file?

Is it a waste of time to make Advanced Tab type and length changes? Is it a better strategy to just use Suggest Types, and not change anything, and take whatever you get and set up more Derived Column entries? How did the designers intend for file changes to be handled?

Or is there an easy way to add new fields to this import that I am overlooking? I am finding it MUCH more laborious to set up or to modify a file load in SSIS than in DTS. In DTS, I just Edit the transformation, and add the field to the Source and Destination lists, and I'm good to go. My boss isn't understanding why a "better" version is taking so much more work!

thanks,

Holly

Ah, well, I have some sympathy for you, but as soon as you said Data Warehouse, I lost that sympathy. The flat file source CANNOT change. SSIS isn't meant to be a be-all/do-all application. There are many downstream metadata components that would surely break if the flat file source could somehow magically understand the new columns. The whole point of SSIS (and many die-hard DTS people will balk at this) is to enforce strict-adherence to the way connections are built. Any changes require developer effort to ensure that the changes are handled appropriately. (No guessing!)

Suggest data types is good for a starting point, but you'll still need to go through and make sure that they are correct. The suggest feature does not scan all rows.

The more important question is why are you (the maintainers of the Data Warehouse) allowing for a constantly changing file to be introduced into the system? There should be some sort of production control in place.|||Exactly for the reasons Phil says, I don't use the "flat file" connection in SSIS for imports unless really, really needed.

If it is a standard delimited file, I use a temporary table in the same field structure as the source and the field names for the destination. Use Bulk Insert or BCP to insert into the temp table, then use SSIS to transform/move the data to the ultimate destination. In most cases use
INSERT INTO dest SELECT * from temptable. In other cases, I have to write a dynamic sql statement to read and populate the matching field names between the two tables and insert them.

That way if a new field is added, all you have to do is change the temp table and the dest table. Nothing in SSIS needs to change.

Yes, I know it is slower. But the ease of changing far outweights the added time it takes.

|||

Sorry, I obviously did not explain clearly. The files are not "constantly changing."

Business users submit requests for data warehouse additions. Our data warehouse has expanded constantly over the years. New tables, new fields in existing tables, etc. When these requests are approved, the changes have to be implemented. If the data comes from the mainframe, and if it is logically part of an existing download, the programmer adds the new field/s to the appropriate download creating the extract file. Then the data warehouse administrator has to modify the package to import the new data fields. As businesses change, new regulations, new situations, new data is needed.

I don't mean that one night, suddenly, with no warning, there are new fields in a file. But I'd love to have a dollar for every field I have added to an existing DTS import over the years.

We use DTS for our ETL. I am now converting it to SSIS, but only in the Test environment. So far I have mostly been creating the SSIS packages, but since it takes a while, and other work goes on, I am coming to situations where I have to modify existing SSIS packages that I have already converted, because I have changed the DTS packages they are converting from.

I am beginning to think that I took the wrong strategy to change types and lengths on Advanced Tab, because I can't see any way of adding another field or two, without wiping out all that work. That was the purpose of my question. What are others in this situation using as their approach.

Thank you for your reply and I would be glad to hear other opinions.

Holly

|||

Tom,

Yes, I can see your point. The ideal thing would just be to let the import file define the file structure of that temporary or prep or stage table, where ever you want to dump the data. I am importing, in most cases, into a "prep" table from the text file. However, the approach to setting up the data structure of the prep tables was defined in 1999 and 2000, and the stored procedures that load from them are expecting that structure, and there are hundreds of tables, and I really don't have time to change them and then change the stored procs to deal with the new situation..... It seems to be an example of, if you set up a new ETL using SSIS it would be done very differently from an ETL that was set up under older versions. The conversion requires fixing something, and it's sort of a question of what you decide to fix.

It is interesting to hear the different ways people do things.

Holly

|||

I think a general "best practice" kind of approach to loading files is to always load them to a staging table. The goal of this initial load is that it should contain virtually no logic - it should be so simple that it virtually never fails.

A more drastic way of doing this is to use a "generic loader" setup. This consists of one staging table with enough varchar columns (max size) to match the number of columns in your widest table, plus a "target_table_name" column. You can create a generic file import process that will load every record in every file to this structure, setting the target table column differently for each load. You will have to maintain a mapping of your generic column names to the specific column names of the target table, or you can double up on your columns and add a "Column name" column for each column.

You can change datatypes, etc. when loading the targets from the generic table. This makes it a bit easier to recover from errors due to datatype issues, which is especially common when loading user-maintained Excel files, for example.

Obviously there are some drawbacks to this approach, like the mapping confusion it can create, as well as the fact that your loads tend to get pretty linear in execution. If you try to parallel load a setup like this the table gets thrashed too much and performance suffers. However, these tradeoffs do lead to a lot more flexibility and you have fewer ETL objects to maintain, though the maintenance may be more difficult in some cases.

There are always tradeoffs, but thought I'd throw the idea it out in case it might fit in your situation.

|||Holly,

I know it is a bunch of work to do to reimplement, I too have hundreds of flat files to import. I started this approach a long time ago because I could see this was going to be a problem.

I shortened part of my process in my description. I actually have a stored proc importing the flat files into a temp (staging) database with the table names the same as the flat files. I use a loop for each table, import each flat file (tablename.txt) into the temp table and process the tables from "import" database into their ultimate destination.

In your case, if you take my approach you could move them into the stage tables in a stage database, and then insert into your existing "prep" tables for the rest of the processing by the stored proc.

Neither SSIS or DTS are pretty when it comes to changing file structures for importing. Especially when all you usually need to do is map txtfile.field123 to table.field123.

Good luck
|||You may try DataDefractor. It handles input data change gracefully.

Regards,
Ivan

Saturday, February 25, 2012

Adding Linked server

Hi guys,
My linked server (from one SQL Server 2000 to another SQL Server 2000) is
not working. My settings are as follows:
Checked other data source
Provider string: DRIVER={SQL Server};SERVER=SERVER_NAME;UID=TEST;PWD=TEST;
Under Security tab:
Be made using the logins current security context is checked.
Please help!When you say it's not working , do you mean "insufficient permissions!? or
are you getting some othe rmessage?
--
Jack Vamvas
__________________________________________________________________
Receive free SQL tips - register at www.ciquery.com/sqlserver.htm
SQL Server Performance Audit - check www.ciquery.com/sqlserver_audit.htm
New article by Jack Vamvas - SQL and Markov Chains -
www.ciquery.com/articles/art_04.asp
"David" <David@.discussions.microsoft.com> wrote in message
news:20B29608-CAEF-483A-ADF4-A5DA13B87D5D@.microsoft.com...
> Hi guys,
> My linked server (from one SQL Server 2000 to another SQL Server 2000) is
> not working. My settings are as follows:
> Checked other data source
> Provider string: DRIVER={SQL Server};SERVER=SERVER_NAME;UID=TEST;PWD=TEST;
> Under Security tab:
> Be made using the logins current security context is checked.
>
> Please help!
>|||It says, SQL Server does not exist or Access denied. Thanks
"Jack Vamvas" wrote:
> When you say it's not working , do you mean "insufficient permissions!? or
> are you getting some othe rmessage?
> --
> Jack Vamvas
> __________________________________________________________________
> Receive free SQL tips - register at www.ciquery.com/sqlserver.htm
> SQL Server Performance Audit - check www.ciquery.com/sqlserver_audit.htm
> New article by Jack Vamvas - SQL and Markov Chains -
> www.ciquery.com/articles/art_04.asp
> "David" <David@.discussions.microsoft.com> wrote in message
> news:20B29608-CAEF-483A-ADF4-A5DA13B87D5D@.microsoft.com...
> > Hi guys,
> >
> > My linked server (from one SQL Server 2000 to another SQL Server 2000) is
> > not working. My settings are as follows:
> >
> > Checked other data source
> > Provider string: DRIVER={SQL Server};SERVER=SERVER_NAME;UID=TEST;PWD=TEST;
> >
> > Under Security tab:
> > Be made using the logins current security context is checked.
> >
> >
> > Please help!
> >
>
>

Adding Indexes to Table

I'm looking for information on how to add indexes to a table in a SQL Server 2000 Database, why add them etc? Any source of good information on the web regarding this?

Hi,

I've been looking for this kind of info too. I have a feeling, and I think I have read somewhere, that the indexing is more or less automatic in SqlServer 2000 and that's why one can't find out how to do it. However, someone else would need to verify this...

Pettrer

|||

...indexes to a table ... why add them?

Lookup speed. Imagine using a dictionary that wasn't in alphabetical order...

|||And how could i use Enterprise Manager to add them to a table?|||

Hi again,

As I suspected, these are being set automatically for me (in SqlServer Management Studio Express) and are displayed in the "column properties" section. I guess the same goes for Enterprise manager.

Pettrer

|||

pettrer:

Hi again,

As I suspected, these are being set automatically for me (in SqlServer Management Studio Express) and are displayed in the "column properties" section. I guess the same goes for Enterprise manager.

Pettrer

I suspect this is a non clustered index. I was looking at how to explicitly set an index based on the execution plan.

|||

Hi,

To create index, you will need to use CREATE INDEX statement.

The syntax can be found from

http://msdn2.microsoft.com/en-us/library/ms188783.aspx

Friday, February 24, 2012

Adding Formula in CRYSTAL REPORT

Hi,

I am using a Crystal reports in order to get the information from the database (ORACLE 9).
The source of data is the script which I added via add command using ORACLE OLE Provider.

I would like to say that now I am trying to implement CR formulas in order to give the
following message :
"NO USERS LOGIN"

I wanted that when there is no user connected to the database Crystal Report gives this message.

In order to that I created the following forumla from the formula Tab, but it didn't give any
message on crystal reoport, when no user login to the database:

Formula 1

if count ({command.USER}) < 0
then "No User Login to Database"

or

Formula 1

if count ({command.USER}) < 1
then "No User Login to Database"

I tried to drag this formula on various section of reports (Detail, Report header), but it
did'nt work.

But when I used greater than sign instead of using less than size, this formula works and
gives the message.

For Example:

Formula 2

if count ({command.USER}) > 0
then "User Login to Database"

Could someone please inform what changes I made in the formula 1 in order to display the message
"No User Login to Database", when there is no user login to database.

I also like to say that the routine which I am using in the Crystal Report gives me information
of several Users who are connecting to the database.

Thanks

DavidI think that's normal, because crystal create a connection to your database to verify the number of user connections. So I think that you must put the following formula in your report:

if count ({command.USER}) < 2
then "No User Login to Database"

To verify what I say just print the field {command.USER} to see his value.|||Hi ariqa,
Use the following formula

if isnull ({command.USER})
then "No User Login to Database"

Madhivanan|||Hi Machuet & Madhi,

Thanks for your cooperation.
Yours solution helped me.

Thanks

Regards

David