Category Archives: Technology

Talend Open Studio for Data Integration: Lesson 2 – Creating a MS SQL Connection

The first thing you need to do once you have a Talend Project created is define your connections that data will be extracted from or where it will be written to.

Microsoft SQL Server is a good example of a database that many users will need to work with in ETL, so we will cover how to add that to your project in Talend.

1.  Create the Connection in Talend Repository

In your open project, look for the Repository.  This is where you will find many different objects available within your Talend project.  Specifically for adding a connection, we are going to expand the Metadata category within the repository.  Then you will need to right-click on “Db Connections” and choose “Create Connection”

Step 1 in Creating a Microsoft SQL Connection in Talend ETL
In the Talend project repository, right click on Db Connections under Metadata to add a connection to MS SQL

 

2.  Enter MS SQL Connection Name

You will then be prompted with a dialog box that will walk you through creating your connection.  In the first box which is step 1/2, enter at least a name for your connection and then click Next.

Step 2 in Creating a Microsoft SQL Connection in Talend ETL
Enter a name for your Microsoft SQL connection in Talend ETL

3.  Enter MS SQL Connection Details

Next, enter the details needed to define your connection.  Make sure to use SQL server authetication by entering a Login and Password.  Currently, Windows Authentication with SQL Server 2008 is not consistent.  After you have entered the details click “Finish”

Step 3 in Creating a Microsoft SQL Connection in Talend ETL
Enter the MS SQL connection details and click Finish

 

4.  Verify

You will now see your connection under Metadata in the Talend Repository

Adding a MS SQL connection in Talend Open Studio ETL
You will now see your Microsoft SQL connection created in Talend ETL

Talend Open Studio for Data Integration: Lesson 1 – Creating a Project

The first step in creating an ETL project in Talend Open Studio for Data Integration is to create the actual project definition. The project serves as a container for resources, including database connections, job scripts and contexts. When the application is launched, it will use a default folder in the installation directory named workspace. When a project is created, the project exists in a sub-directory in the workspace directory.

See the screenshot from Talend Open Studio for Data Integration 5.1:

Notice we have typed in a name of “OurFirstProject” in the Create a New Project field.  When we click the Create… button we get a dialog box that comes up and allows us to add a Project Description.  Clicking Finish then saves our project and gives us a new view in Open Studio as seen below.

At this point we can highlight the project we want and click Open, or we can use any of the other features.  When you arrive at this screen as the opening screen, simple click Create… to start a new project.

Lesson 4: Creating an HTML File

Even if you know how to write HTML and have all of the tags memorized, you still need to know how to create a file that the web server will access. When a web server service is installed on a workstation, it will generally create a directory to hold your web server documents.

If using Microsoft Internet Information Services, and you have installed the web server service, then you should find a directory called “wwwroot” on your hard drive. The options for this server are usually found by going to Control Panel/Administrative Tools/Internet Information Services Manager. There you can do things like configure IIS, set the home directory and set the permissions.

If you are using Apache, you will find an httpd.conf file in the location where you installed Apache. This file is going to control the Apache web server and setup things like the home directory. By default, Apache uses a directory called htdocs for the web server directory.

Ok, so you have figured out where to put your documents. Now, how do you create the file that the web server will use? The easiest way is to simply create a new file on your computer called index.htm or index.html. Often, file extensions are hidden, so you need to make sure your file is actually index.htm rather than “index.htm.txt”. Using notepad is a good way to get started, but as you write more complicated code, a tool like will help by color coding your written code.

Now to make our first HTML file, we will take the entire code segment found in Lesson 3 and paste it into our blank index.htm file. Save this file to your web server directory. If this web server is running on your own computer, you can probably just type localhost into your web browser and you should be redirected to your web server which will display your default page.

By naming your document index.htm chances are good that you will be making it your default page, but we will get more into default pages in Lesson 5 and 6 where we will cover IIS and Apache configurations respectively. See you soon!

Lesson 3: Beginning HTML

If you didn’t check out the last tutorial, quickly check it out. We are going to pull apart the HTML code that we found in that lesson to give a briefer in HTML.

HTML stands for Hyper Text Markup Language. It is a language that is based on control statements, basically commands wrapped around text to tell the web browser how to display the web pages. Without many of the HTML commands that we have available to us, web pages would just display as text. Fortunately, using the generic commands that many web browsers interpret the same way, we can make functional and graphical web pages.

In HTML, the commands are call tags. HTML tags tell the browser what to do with the data returned from the web server. Let’s take a look at our example from lesson 2. The entire text is here:

1
2
3
4
5
6
7
8
<html>
   <head>
      <title>My First Webpage</title>
   </head>
   <body>
      <p>Not going to say Hello World!</p>
   </body>
</html>

That would actually display a page that just had some text in it that says “Not going to say Hello World! and the title of the browser changes to “My First Webpage”.

So let’s review the various parts, keeping it in the hierarchy found. First of all we have the beginning and closing HTML tags.

1
2
<html>
</html>

These tell the browser that the markup found in the text is the HTML format. Every HTML page has a tag at the beginning and a tag at the end. The second tag, with the forward slash, is called a closing tag. This tells the browser that it has reached the end of the segment that matches the name in the closing tag. All text and tags found between an opening and closing tags are considered to be “nested” between those tags.

Let’s look at the next part. The head tag.

1
2
3
   <head>
      <title>My First Webpage</title>
   </head>

The head tag always goes next after the opening html tag. The head is the header section and contains information for the browser about the page, as well much of the scripting instructions that might appear on the page. We will get into that much later, but for now just think of the head section as definitions. For example, the Title tag, which is nested in the head section, tells the browser the name of the specific page that the user is on.

Next comes the body. Let’s look at how we wrote that code:

1
2
3
   <body>
      <p>Not going to say Hello World!</p>
   </body>

The body section always follows the head section. The body contains the actual data that is to be displayed on the web page, it tells the browser how the page should look. Text, graphics, and links can all be found in the body section. In this example, we are including the p tag which tells the browser to display a block of text as a paragraph. Notice the closing tag both for the p tag and the body tag.

That is it for this lesson. Congratulations, you are getting closer to creating your own web page! Next lesson, we will talk about taking this code, storing it in a file and viewing it in your web browser.

Lesson 2: Web Languages

If you have been looking into doing some web programming, then you probably have come across the great number of web programming languages available.  When you are designing a website, you write code into a file that get’s processed by the web server.  Review Lesson 1 if you are unsure of what a web server is.  When it comes to web languages, there are two categories of languages.

The first category is client side.  When you are writing client side programming, you are creating the parts of the web page that the user sees when they access your web page.  This includes things such as text, graphics, animations and forms for them to enter date.  Some examples of client side programming languages include HTML and JavaScript.

The second category, and much more diverse in languages, is server side.  Server side processes certain events that occur during the process of viewing a web page.  For example, if you submit a form,  server side programming may save the data you entered into a database.  Or, server side programming might be used to display text on a web page relevant to a the specific user.  Server side programming includes processes that occur on the server which format the page that is returned to the user and allows for more dynamic and data driven pages.  Some examples of server side languages are ASP.net, PHP,  and Java Server Faces.

When you begin writing code for a webpage, you will begin programming for client side at first.  With client side programming, any web server is capable of providing the pages to the users on the internet.  When you get into the various server side programming though, you will need specific web server types for the various programming language.  For example, programming in ASP.net requires Microsoft Internet Information Services.  Programming in PHP could run on IIS, but can also run on Apache.  Java server faces can run on a JBoss Application Server or a GlassFish server.  You will need a server software that can handle the commands in the server side language.  That topic is further down the line though, and we will be focusing on client side programming first.

Here, for your viewing and curiosity sake, is a sample of very basic HTML programming

1
2
3
4
5
6
7
8
<html>
   <head>
      <title>My First Webpage</title>
   </head>
   <body>
      <p>Not going to say Hello World!</p>
   </body>
</html>

In the next lesson, the various parts of the above will be explained. Please check it out!

Lesson 1: Web Pages

The most basic thing that you need to know about when it comes to programming web sites is how a user on a computer somewhere in the world even accesses your web page.   When a computer is connected to the internet, wherever it connects tells it the location of a few places where it can look up addresses.  The complicated name for this is DNS, but think of it like this:  You are looking for a business and someone hands you a phone book (if you even know what that is).  You now have a list of addresses and you can find the location of the business you are looking for.

On your computer, even though you are connected to the internet, when you want to find a website, like teechmi.com, your computer or device does not know how to get to the correct location.  In this case instead of a business, you are looking for a computer somewhere on the internet that has the data for teechmi.com.  This is called a web server.  However, your internet connection usually provides you a few “name servers” (aka phone books  that tell your computer how to get the web site you are looking for.  Once your computer knows the address, it sends the request and the various pipes that your request travels down is up to the “internet” and not your computer.

That’s how a web site is retrieved, but what does it take to hold the data for a website?  First off, you need a web server.  This not only has all of the web pages that you have created, but some sort of software that provides those web pages when a request is made for them.

The two most popular web server software found in use today are:

Internet Information Services (IIS) which is part of certain versions of Windows software
Apache which is a free open source web server

Once you have one of these software packages running and some web pages your computer can be called a web server (well, as long as it’s connected to the web!).  We will get into more details of how to make it so your web server can be found on the internet in the future.  That is about all you need to start for this lesson, helping you understand how a computer gets to a web page.

HTML and JavaScript – Repeating Anonymous Function

This post demonstrates how to use an anonymous function on an html page in order to create a looping action. It is a relatively simple process and the following code is all that is needed to see this in action.

The comments should be pretty self explanatory in this code to see what is being done. Try it yourself!

Notice a few things. If you would like to end or clear the interval you set in JavaScript, there is a function that is clearInterval. Make sure you replace the function name of the one you would like to repeat in the line where you call the setInterval, in this example the name is render.

Should be straightforward. A challenge is for you to use an onClick event to begin and end the interval that you create.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<html>
	<head>
		<title>Repeating Anonymous Function</title>
	</head>
 
	<body>
		This is my text.  I'm going to repeatedly pop-up a message box that tells you how many times it's popped up.
 
	<script	type="text/javascript">
 
		//Our anonymous function declaration (at the end of the html document)
		(function() {
 
			//Set the timer variable
			var intTime = 5000;
			//Create the interval variable;
			var myInterval;
			//Create a counter
			var count = 0;
 
			//Create the interval with render being the function to call and intTime being the interval
			myInterval = setInterval( render, intTime);
 
			//Here is the function that our timer will call
			function render() {
				count += 1;
				alert("Number of Cycles: " + count);
 
				if (count == 10) {
					//The following ends the execution of the interval
					clearInterval(myInterval);
				}
			}
		//Complete our anonymous function
		})();
 
	</script>
	</body>
</html>

Talend Open Studio 5.2.0M4

For those awaiting release level 5.2 of Talend Open Studio for Data Integration, there is now a milestone release available from Talend’s website. For those awaiting a fix for the problems with connecting to SQL Server 2008 R2 using Windows Authentication, sorry to say but the problem is not corrected.

A solution remains to simple use SQL Server authentication, and that works, but this seems specifically to be a bug in the adding the connection and retrieving schema only.  If you create the connection and it errors and then manually define a schema for a table stored on your SQL Server 2008 R2 database, it will actually query the data successfully.  This means that they have attempted to make Windows Authentication work, but are not fixing certain bugs associated with it.

Although using SQL Server authentication is a valid work around, it is not an acceptable method to rely on long term.  Hopefully Talend still chooses to address this.

Basic HTML5 Canvas Tutorial

Have you yet to play around with the canvas tag available in HTML5?  Need help just getting started with a basic document so that you can begin manipulating it different processes?

The canvas element is a very useful tool to understand in web development.  It offers great potential to web graphics, animation and even to overall page layout.  Animate layers on your webpage to make it stand out or even create a game. The possibilities are extensive.

See the following code example to get started with a basic canvas element.  The comments should clearly highlight what the various lines of code are accomplishing.  Hopefully this helps you get a good start towards utilizing this excellent tag in HTML5.

<html>
 
	<head>
		<script type="application/javascript">
		//Create a function to call after page load to draw a rectangle on
		//our canvas element
		function createRect() {
			//Reference to our canvas element
			var myCanvas = document.getElementById("myCanvas");
			//Create the context for which we will manipulate our canvas
			var myContext = myCanvas.getContext("2d");
			//Create a filler for what we draw in our context
			myContext.fillStyle = "rgb(0, 0, 256)";
			//Create a shape to display
			myContext.fillRect(20, 20, 100, 150);
		}
		</script>
	</head>
	<!-- After the body loads, call our function to draw our rectangle -->
	<body onload="createRect()">
		<!-- Create the canvas element on our page -->
		<canvas id="myCanvas" width="800" height="600">help</canvas>
	</body>
 
</html>

That’s all there is to it!

Reduce Transaction Log File Size on SQL Server 2008

Do you have a transacation log file on SQL Server 2008 (in explorer is seen as the LDF file) that has grown beyond the desired size limit?  Maybe you are attempting to issue a checkpoint command or a shrinkfile command, but it is not working to reduce the file size.

Possibly you are getting the following error:  Cannot shrink log file 2 (<your_log_file_name>) because the logical log file located at the end of the file is in use.

The simple answer is your log file is full.  Shrinking will require some free space in the log file to perform the operation.  If you have already attempted to resolve this with the common instructions of changing the database from full recovery mode to simple recovery mode, but it hasn’t helped, then there is one more thing you may need to do.

When the recovery mode is set to simple, a checkpoint command can be issued that will reduce the transaction log and remove the definition of previous statements.  There is something that will cause the transaction log to not self-maintain though, even in a simple recovery mode.   You need to check if the database is waiting on a specific action before the log file can be reduced.

To find out if the database is waiting on an action, you can run the following query:

SELECT name, log_reuse_wait_desc FROM sys.DATABASES

Find your database in the list and see what the description says.  If it says “NOTHING” then you should be OK to issue a checkpoint command.  If it says “LOG_BACKUP” which is likely if the checkpoint command is not working then the database needs the transaction log backed up before it can reuse the space.

The basic answer is you need the transaction log backed up, but you may have already tried that.  If you are receiving the error that a previous backup does not exist, then you will need to perform a full backup of your database.

If you have not already tried the log backup, the following command can quickly back it up for you to disk:

BACKUP LOG  TO DISK = 'C:\temp\backup.bak'
GO

After that is complete, you should be able to issue a checkpoint and then a shrink file.  The following example will shrink the log file down to 50 MB

USE 
 
GO
 
CHECKPOINT;
 
DBCC SHRINKFILE(, 50);
 
GO

That should do it.  The main key to remember if you’ve exhausted other resources is to perform a full backup of your database before trying anything else.  This seems to be the only way to get the log file shrunk when it is full.