You can use PROC CASUAL with a DROP TABLE statement to deliberately or explicitly remove tables from the memory space. Before you do, you may want to save that table off into the data source area for later reloading. And an excellent format for storing your data is Slashed DAT. This is a format that allows the data to be quickly loaded back into memory in parallel just by reading the header. It’s typically much, much faster than transferring data from the client-side to the server-side. Now, let’s have a look at how we might save a table that is in memory space in a CAS library out to the permanent storage space of the CASEIN in SASH DAT format. To do this, we’re going to use a code snippet to help us out. First, let’s have a look at the files that already exist in the Caribs we’ve been dealing with, CAUSER and Public. And we can see that the CAUSER library has the expected files, but there is no SALESIANS SASH DAT file. Now, you remember that SASH DAT file was nice because it made for very, very fast loading of large data sets. Now, SALES isn’t a particularly large data set, but it will do for our demonstration. In the Public memory space, you’ll see that the SALESIANS table exists.
We want to take this table and start back as a SASH DAT file. So I’m going to to to to to to put my cursor right here in the code window, and I’m going to find a code snippet that will help me save a table to the CALEB. Now, if I Double-Click it, it opens up in a separate window. But if I Right-Click and choose Insert, it will insert code into my current program. So the table name that we want to save is SALESIANS, and the CALEB like that it is, it’s Public. The CALEB where I want to store it really is my CAUSER. And we’re going to call it salsas. Sash at. Now that file should be available in my CAUSER library. And a quick run of PROC CASUAL should prove our point. And now we see the sales. Sash at file is stored in our CAUSER library, And then rather than loading from an Excel spreadsheet, next time we can load it in parallel using the SASH DAT file. Hi. Mark Jordan, again. And in Lesson Three of Programming for SAS Vi ya, we’ll take a look at our traditional SAS code, DATA Step, and SQL, and see how we might have to modify that to run in SAS Vi ya. Now, if you’re like me, you have extensive experience programming and DATA Step and Base SAS. Are you maybe wondering, have my skills become obsolete? Does everything I know go down the tubes? And that is a good question, but the answer is that it’s not true. You can use much of the same syntax that you’re very much used to using in DATA Step and run that in CAS at much faster speeds because of the parallel processing capabilities we find in CAS.
Now, the very thing that makes CAS fast also changes a little the behavior of programs that we write and run in CAS. So we’ll need to be aware of those changes and figure out how we want to strategize to mitigate the effects that we don’t want while keeping all those high-speed effects that we do. So to understand what causes the difference, a DATA Step itself is a single-threaded process. It reads the data sequentially, one row at a time. And the instructions that you write in a DATA Step are really instructions on what to do to each row of data. And so this concept of sequential single-threaded processing shouldn’t come as a surprise. But in CAS, we have more than one work that can be executing that same code at once. And so the data needs to be distributed amongst the workers so that we can maximize our throughput. And CAS and Vila take care of all that stuff automatically for us. And this means that our DATA Step will run now in multiple threads instead of in a single thread. Now, the data is partitioned out into blocks. And this means that not every thread is going to receive the same number of rows of data. Another interesting thing about it is that when you have multiple people working on the same thing in parallel, some will finish quicker than others. So that the order that you get things back in from the processors is not always the same every time you run the process.
We’ll look at some of these examples and how, in some cases, the parallel processing is completely transparent to us. In other situations, where we need to take some action because of the way that threads produce their results. Let’s start with a simple Base SAS DATA Step program using a single thread. And then we’ll run the same thing straight up to CAS and Vila and check it out. So the DATA Step is used to manage or manipulate tables in preparation for further analysis, generally, with some type of analytical procedure in SAS. We can modify the values that already exist in columns. We can compute new columns, and we can conditionally process and produce extra rows or only the rows that we desire. And we can combine tables in a DATA Step. Here’s a relatively simple DATA Step. It reads in a data set called MY CUSTOMERS, and it writes out a data set called DEPARTMENTS. All we’re doing is checking the value of CONTINENT to set the appropriate value for the DEPARTMENT.
If both the output table and input table in a DATA Step are CAS tables, then this DATA Step will run in CAS automatically without any further need for me to do anything to my code. Now let’s take a look at how to take an existing DATA Step program and modify it so that it can run in CAS. The big thing that we’re going to have to remember is the DATA step has to read from a CAS in-memory table before it could run in CAS. And ideally, any data that it writes would also write out to a CAS table. So first, let’s take a look at the data program and see how it works. The data program creates an output data set called WORKDEPARTMENTS by reading in the customer data set from our SAS library. The select group sets the value of DEPARTMENT depending on the CONTINENT in the particular record. And the output data set will only contain the variables CITY, CONTINENT, and DEPARTMENT.